#Like we have some example datasets where he already put all the solutions in and stuff
Anyone possibly know how to do a methaanalysis? I mean I sure as hell have no idea what do to with the data my professor sent me because he never actually got around to showing us wtf to do with it and how to use that damned statistics software.
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
This Scaling Tech Could Let You Sync Bitcoin Straight From Your Phone
“Maybe we don’t have to store everything ourselves.”
That’s Tadge Dryja, cryptocurrency research scientist at the MIT Digital Currency Initiative, explaining the concept behind his bitcoin scaling solution, “utrexxo.”
Based on an idea that has been pursued by developers for many years, utrexxo seeks to streamline an aspect of bitcoin’s code that leads to heavy storage requirements over time.
Simply put, it addresses what is known as the UTXO set – or the code that gives information on whether a bitcoin has been spent.
Currently, bitcoin nodes must download the entirety of this information, what is known as the “state,” in order to verify it.
With utrexxo, though, rather than having to download the entirety of the bitcoin state, bitcoin holders could simply verify if it is correct using a cryptographic proof. This approach could minimize storage requirements to the extent that it might even be possible to run bitcoin on a mobile phone.
Also known as an accumulator, the tech underpinning utexxo isn’t a new idea – developers have been discussing ways to implement similar kinds of code since bitcoin’s early days – but it was previously met with hurdles to implementation.
Now, – due to work by Dryja and others – it is swiftly becoming a reality. In an early prototype, Dryja has created functioning proof-of-concept code.
And he’s not alone. Dryja is joined by cryptography heavyweights Dan Boneh, Benedikt Bünz and Ben Fisch, who have written a paper detailing an alternate accumulator method.
“The high-level goal is basically your phone could run a full node. That is the dream,” Bünz, who is known for his work on bulletproofs, a scaling tech that allowed monero to reduce transaction fees by 96 percent, told CoinDesk.
Bünz’s paper has even been picked up by ethereum researchers, who are investigating how the technology might apply to layer two scaling solution, Plasma.
And part of this flurry of activity stems from the fact that due to the nature of the technology, it doesn’t require a hard fork – a type of software update that requires unanimous support and participation – in order to safely activate. Instead, accumulators would be deployed at the wallet level, which significantly reduces the hurdle to implementation.
“Hard forks are almost impossible on bitcoin. Soft forks are hard as well,” Bünz said, dding:
“It’s great that we can just deploy it, it makes it a lot easier and it means we can have a competition of ideas.”
Growing bigger
Stepping back, accumulators have been discussed since as early as 2010, however, were previously met with an insurmountable bottleneck – what is known as a bridge node.
And that’s because, in order to function, accumulators require other people within the network to support the software. While previously, this was highly resource-intensive, Dryja has built a bridge node that doesn’t come with additional trade-offs – meaning that accumulators are now feasible for the first time.
According to Dryja, that’s notable because utrexxo could address what has been a long-term pressure point for bitcoin: its increasing UTXO set.
UTXO – which stands for unspent transaction output – is the data structure that gives information about all the outstanding bitcoins on the network.
While it is known to fluctuate (the UTXO count actually decreased in 2018), the dataset tends to increase alongside bitcoin’s usage. This means that, if left unchecked, it could continue to grow, necessitating ever-increasing storage requirements.
In particular, this is something that concerns what is known as a bitcoin “full node,” a type of node that keeps a history of every transaction ever made on bitcoin. Currently, a full node requires about 200 gigabytes of storage – just beyond what a conventional laptop can store.
With accumulators, though, full nodes no longer need to store all of the blockchain data in order to order to reach consensus about where coins are on the network. Instead, they can simple provide proofs that data is correct.
“The high level is this idea of separating the consensus away from the state,” Bunz summarized, “Anyone can now be a full node without having to store the data.”
Previously, mobile full nodes were addressed by a particular type of client called an SPV client, which requires light wallets to trust other full nodes to have the correct data. Because this comes with decreased security assumptions, accumulators are heralded as a way to achieve this without trade-offs.
“My hope is that the people who are currently running SPV wallets would be able to use [utrexxo] and get the same security of a full node, with the resource requirements that are more similar to SPV,” Dryja summarized.
The competition
But while they are both positioned toward the same goal, there are ways in which Dryja’s utrexxo model and the work by Bunz differ significantly as well.
First and foremost, Dryja’s work stands out from the fact that it is much closer to deployment. For example, it already has a working prototype and functioning code. Equally, it uses simple mathematics – hash functions that are already familiar to bitcoin.
Bunz’s design, on the other hand, is potentially more efficient and boasts more advanced features. Still, it uses mathematics that according to Dryja, is comparatively more risky and exotic compared to his own design.
For example, one stage of Bunz’s accumulators requires a kind of trusted setup – in short, the product of two secret numbers, that if revealed could be compromising to its security.
“We’re using fancier maths to get different properties,” Bunz said,
“The high level differences is [utrexxo] is ready now, it’s based on a simpler thing, it’s based on simple hash function, which is a good thing, but ours has more advanced cool features like batching and aggregating which would be cool at some point.”
Additionally, Bunz’s paper has a section that may have implications for the world’s second largest blockchain, ethereum, as well.
Speaking to CoinDesk, Georgios Konstantopoulos – a researcher and developer for ethereum layer two scaling solution, Plasma – said that due to its applicability, Bunz’s paper had attracted a lot of enthusiasm in the ethereum research community.
For example, Konstantopoulos said that Bunz’s accumulators could even be a more efficient replacement for the most fundamental data structure in ethereum, the Merkle-tree. Additionally, accumulators could help solve a problem inherent to Plasma Cash, which requires users to store large transaction histories.
The enthusiasm was such that Konstantopoulos estimated 10 new designs of how Bunz could apply to ethereum have been proposed, sparking the researcher to undertake a “taxonomy” to analyze the viability of each idea.
He told CoinDesk:
“I’m generally very optimistic that we will find a UXTO compaction scheme for Plasma.”
A ways to go
Still, there’s work that remains on all fronts before the scaling solutions can be considered viable.
Konstantopoulos emphasized that while accumulators could theoretically be useful for ethereum on both layer one and layer two scaling solutions, work remains in order to fully investigate its practical viability.
And both Bunz and Dryja emphasized similar caution as well.
For example, while accumulators have the potential to allow full nodes on mobile phones in terms of storage, they will encounter other hurdles to implementation.
In Dryja’s model, he emphasized that in its current implementation the accumulator is only really useful for bottom of the range computers.
“If you have a fast computer this actually doesn’t help. It will not make much difference or make it slower. But if you have a crummy computer it will make a really big difference,” he continued,
“We want bitcoin to work on crummy computers as well.”
For Bunz’s paper, work remains in order to build a working implementation of the design, which may come with its own unanticipated research problems.
Plus, using the mobile phone as an example, Bunz said that it would be technically feasible to deploy in terms of storage, the phone would need to be constantly online in order to function.
However, Bunz said that such problems can likely be overcome given sufficient research.
“This is one step of the way for getting us to a space where your mobile phone can run a full node,” Bunz said, “There’s nothing theoretically that stands in the way, we just need to be smart about how we do things.”
He continued:
“There needs to be a lot of new innovation happening, but thankfully there is, and it’s really possible.”
Phone image via Shutterstock
!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function(){n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)};if(!f._fbq)f._fbq=n; n.push=n;n.loaded=!0;n.version='2.0';n.queue=[];t=b.createElement(e);t.async=!0; t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window, document,'script','//connect.facebook.net/en_US/fbevents.js'); fbq('init', '239547076708948'); fbq('track', "PageView"); This news post is collected from CoinDesk
Recommended Read
Editor choice
BinBot Pro – Safest & Highly Recommended Binary Options Auto Trading Robot
Do you live in a country like USA or Canada where using automated trading systems is a problem? If you do then now we ...
Demo & Pro Version Try It Now
Read full review
The post This Scaling Tech Could Let You Sync Bitcoin Straight From Your Phone appeared first on Click 2 Watch.
0 notes
lucyariablog · 6 years
Are You Really Smart About How AI Works in Marketing?
In its widely talked about State of Marketing Report, Salesforce reports that just over half (51%) of marketers are using AI in one form or another, while another quarter plan to test it over the next two years.
A smaller study of over 500 search, content, and digital marketers by BrightEdge found that just 4% have implemented AI (that’s not a typo).
Who’s right? Salesforce, which reports one in two marketers is using AI, or BrightEdge, which puts the number at one in 25?
The answer may be “neither.” That’s because many marketers (and business leaders as a whole) are confused about which technologies are genuinely AI-powered and which simply rely on advanced algorithms and analytics.
Many marketers are confused about which tech is genuinely AI-powered, says @Clare_mcd. Click To Tweet
As Luis Perez-Breva, head of MIT’s Innovation Teams Program and research scientist at MIT School of Engineering, explains, “Most of what the retail industry refers to as artificial intelligence isn’t AI.” He says many “confuse analyzing large amounts of data and profiling customers for artificial intelligence. Throwing data at machines doesn’t make machines (or anyone) smarter.”
Throwing data at machines doesn’t make machines (or anyone) smarter, says @lpbreva. #intelcontent Click To Tweet
Rather, AI’s promise is what is often called relevance at scale. It’s the ability of machines to crunch massive datasets and data lakes – structured and unstructured data – and optimize decision-making in a way that algorithm-enabled humans cannot achieve. Perhaps most importantly, in an AI-enabled system the machine learns and improves without human input.
Rather than ask, “How many marketers are using AI?,” the more apt question may be, “What are you doing with it?” Let’s examine some of the ways companies are using AI-led initiatives to make the most of AI’s promise.
Should You Trust Artificial Intelligence to Drive Your Content Marketing?
8 Ways Intelligent Marketers Use Artificial Intelligence
Using AI for personalization
Marketers have long practiced personalization in content marketing, developing over time more sophisticated ways of personalizing the customer journey – whether through marketing automation and progressive profiling or using programmatic advertising to support our content path. The idea is that as we learn more about our customer or prospect and fill in information about that person’s needs, budgets, and interests, we can create unique, personalized experiences that educate and delight the person.
Now we are entering the era of hyper-personalization: the ability to personalize not just by persona, profile, or the trail of breadcrumbs people leave on your site, but by a massive set of user details and signals, analyzed and made actionable by machines.
The retail industry is the most talked about application of AI-led personalization, but most examples you read about don’t really fit the definition of AI … they’re just really good personalization.
The examples that seem to cross over – from algorithm-driven personalization to AI-driven personalization – are those in which the AI sifts through data from multiple channels and sources, learning which signals matter in which circumstances and evolving its approach over time. The key variables that influence how one customer interacts with your brand may be completely different from the variables that define another, multiplied millions of times across each person, each channel, and each step of the process – and changing constantly.
HANDPICKED RELATED CONTENT: Cognitive Content Marketing: The Path to a More (Artificially) Intelligent Future
Using AI for voice-searchable entertainment and education
A less common but exciting application for AI-enriched content? Virtual assistants. Alexa (Amazon) offers developers the chance to build “skills” on its platform. Alexa Skills help customers answer questions, gather information, and even control internet-enabled devices and appliances. (To be fair, there’s disagreement about whether Alexa is an AI technology or just an advanced natural language technology – another nod to the problem of assessing AI adoption.)
Companies far and wide are racing to launch Alexa Skills – both to inform and delight customers as well as to test out the channel’s promise.
Companies are racing to launch Alexa Skills to inform and delight customers, says @Clare_mcd. #intelcontent Click To Tweet
Content-rich brands are delivering entertainment and information via Alexa Skills. Disney’s Character of the Day Skill introduces a new character each day from Disney, Pixar, Marvel, and Star Wars. Or you could try out Cat Translator to understand the “why” behind weird cat behavior.
Real-time news
Media companies have been among the first to offer content snippets via Alexa Skills. If you enable the NPR News Hour Skill, for example, you’ll have access to a five-minute news summary, refreshed every hour. Big brands are quickly jumping in too. J.P. Morgan customers can access investment news: “Send me the latest research report from Joyce Chang” or “Send me the tear sheet for eBay.”
Customer service and engagement
Global consumer brands are enabling e-commerce, customer service, and analytics using Alexa Skills. The Capital One Skill lets you ask Alexa, “How much did I spend at Target last month?” or “When is my mortgage payment due?”
For content marketers, there are interesting opportunities to deliver education and entertainment via voice-enabled search. Beauty brand Wunder2 was the first in its segment to launch an Amazon Alexa Skill. The company offers a daily beauty tip via Skills, from how to thicken the appearance of your brows to how to achieve healthier looking hair. As one reviewer explained, “It’s very cool when I can get the latest beauty tips while having my hands free to apply my makeup.”
Wunder2 co-founder and CEO Michael Malinsky tells Forbes, “As a business, we are fascinated with the rapid integration of AI into people’s lives. We think the level of adoption will exceed many people’s expectation and create fluid recommendation experiences using AI technology found in Google Home, Alexa, and the recently launched Apple HomePod. It is something we are absolutely developing already.”
HANDPICKED RELATED CONTENT: How to Set Your Content Free for a Mobile, Voice, Ready-for-Anything Future
Using AI to put email on steroids
For marketers, AI-enabled decision-making for customizing and delivering email (i.e., dynamic emails) could be a game-changer.
Once upon a time, marketers would ask, “What’s the best time of day to send out our email newsletter?” Through trial and error, marketers discovered that certain days and times yielded higher open rates on average.
AI, however, allows marketers to send emails based on the open histories of individual users (or people like him/her in the absence of better data). And no longer will marketers send promotions to huge swaths of their audience. Instead, promotions will be designed uniquely for prospects based on a wide range of signals, from cart abandonment in retail to which times of day an individual is most likely to sign up for a conference. Finally, AI will enable much more customized and nuanced customer journeys. That leads to our next AI application – one which is too often misunderstood.
HANDPICKED RELATED CONTENT: Scale Your B2B Content With Artificial Intelligence: Ideas and Tools Marketers Can Try
Using AI to write
Long decried as evidence that AI will herald in a new soulless age, machine-made content is one of the most controversial applications of AI … but, under the right circumstances, it may be the most pro-creative. Let me explain.
#ArtificialIntelligence under right circumstances, might be the most pro-creative content creator. @Clare_mcd Click To Tweet
As machine-made content becomes better at approximating human language, there’s a clear case for its use in content marketing. Not all content generated by marketing needs to be highly creative and witty, after all. Many organizations are already using machine-generated content, such as Edmunds generating vehicle profiles based on manufacturer data and Homesnap publishing community profiles based on publicly available data. The best applications are those in which there’s a need to publish at scale and the content is somewhat “modular” or easily put together from pieces and parts.
And, if you’re not convinced, perhaps this will change your tune. Even The Washington Post uses machine-generated content. According to Digiday, as of September 2017, the paper’s robot writer (a solution from Heliograph) had published 850 articles and tweets like this one:
Landon beat Whitman 34-0; https://t.co/V6zVPi7a9O @LandonSports @koachkuhn
— WashPost HS Sports (@WashPostHS) September 2, 2017
The key is in how you pair the robot to the writing. For The Washington Post, Heliograph generated articles about local political races, where the paper didn’t have the resources to assign reporters but had data to fill in the story. It also published short summaries about the Olympics in Rio via machine. (The paper reports that four employees previously took 25 hours to collect, analyze, and report on a small portion of local election results. Using Heliograph, The Washington Post created more than 500 articles generating 500,000 views.)
And therein lies the most powerful promise of AI: to release marketers from the mundane to focus on more creative and fulfilling efforts. Marvin Chow, vice president of global marketing at Google, writes that artificial intelligence and machine learning “will spark new ideas and push the boundaries of creativity. With new tools, what will makers, artists, and musicians design? And how will that affect the marketing world we work in?” The full vision is still out of reach, but early signs point to a machine-led period of creative efficiency.
Content Creation Robots Are Here [Examples]
Will Artificial Intelligence Replace Manual Content Creation?
A version of this article originally appeared in the August issue of  Chief Content Officer. Sign up to receive your free subscription to our print magazine every quarter.
Discover more about how to use AI (and how not to use it) at Content Marketing World Sept. 4-7 in Cleveland, Ohio. Register today and use code BLOG100 to save $100.
Cover image by Joseph Kalinowski/Content Marketing Institute
The post Are You Really Smart About How AI Works in Marketing? appeared first on Content Marketing Institute.
from https://contentmarketinginstitute.com/2018/08/ai-works-marketing/
0 notes