#Like we have some example datasets where he already put all the solutions in and stuff | Explore Tumblr Posts and Blogs

srasamua · 5 years

Text

Using Python to recover SEO site traffic (Part three)

When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.

This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.

As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.

Let’s get started.

URL matching vs content matching

When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.

Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.

How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.

For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.

We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.

A scientist’s bottom-up approach

In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.

BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.

When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”

One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.

Hamlet’s observation and a simpler solution

For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.

I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.

I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.

Now let’s get to the fun part and get to code some machine learning code in Python!

Collecting training data

We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.

In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.

What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).

Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.

Here we load the raw data already collected.

Feature engineering

Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.

Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.

It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.

We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.

Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.

Here is what our processed data set looks like.

Adding ground truth labels

As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.

We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:

Model training and grid search

Finally, the good stuff!

All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.

We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!

Evaluating performance

After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.

In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.

When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.

Here is the code to put together the confusion matrix:

Finally, here is the code to plot the model evaluation:

Resources to learn more

You might be thinking that this is a lot of work to just tell page groups, and you are right!

Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!

I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.

If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:

Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.

Hands-On Introduction To Scikit-learn (sklearn)

Scikit Learn Cheat Sheet

Efficiently Searching Optimal Tuning Parameters

If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.

Got any tips or queries? Share it in the comments.

Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.

The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.

from Digtal Marketing News https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/

2 notes · View notes

analyticsindiam · 5 years

Text

Intel Readies For An AI Revolution With A Comprehensive AI Solutions Stack

Global technology player Intel has been a catalyst for some of the most significant technology transformations in the last 50 years, preparing its partners, customers and enterprise users for a digital era. In the area of artificial intelligence (AI) and deep learning (DL), Intel is at the forefront of providing end-to-end solutions that are creating immense business value. But there’s one more area where the technology giant is playing a central role. Intel is going to the heart of the developer community by providing a wealth of software and developer tools that can simplify building and deployment of DL-driven solutions and take care of all computing requirements so that data scientists, machine learning engineers and practitioners can focus on delivering solutions that grant real business value. The company’s software offerings provide a range of options to meet the varying needs of data scientists, developers and researchers at various levels of AI expertise. So, why are AI software development tools more important now than ever? As architectural diversity increases and the compute environment becomes more sophisticated, the developer community needs access to a comprehensive suite of tools that can enable them to build applications better, faster and more easily and reliably without worrying about the underlying architecture. What Intel is primarily doing is empowering coders, data scientists and researchers to become more productive by taking away the code complexity. Intel Makes AI More Accessible For The Developer Community In more ways than one, software has become the last mile between the developers and the underlying hardware infrastructure, enabling them to utilise the optimization capabilities of processors. Analytics India Magazine spoke to Akanksha Bilani, Country Lead – India, Singapore, ANZ at Intel Software to understand why, in today’s world, transformation of software is key to driving effective business, usage models and market opportunity. “Gone are the days where adding more racks to existing platforms helped drive productivity. Moore’s law and AI advocates that the way to take advantage of hardware is by driving innovation on software that runs on top of it. Studies show that modernization, parallelisation and optimization of software on the hardware helps in doubling the performance of our hardware,” she emphasizes. Going forward, the convergence of architecture innovation and optimized software for platforms will be the only way to harness the potential of future paradigms of AI, High Performance Computing (HPC) and the Internet of Everything (IoE). Intel’s Naveen Rao, Corporate Vice President and General Manager, Artificial Intelligence Products Group at Intel Corporation, summed up the above statement at the recently concluded AI Hardware1 summit. It’s not just a ‘fast chip’ - but a portfolio of products with a software roadmap that can enable the developer community to leverage the capabilities of the new AI hardware. “AI models are growing by 2x every 3 months. So it will take a village of technologies to meet the demands: 2x by software, 2x by architecture, 2x by silicon process and 4x by interconnect,” he stated. Simplifying AI Workflows With Intel® Software Development Tools As the global technology major leads the way forward in data-driven transformation, we are seeing Intel® Software2 solutions open up a new set of possibilities across multiple sectors. In retail, the Intel® Distribution of OpenVINO™ Toolkit is helping business leaders3 take advantage of near real-time insights to help make better decisions faster. Wipro4 has built groundbreaking edge AI solutions on server class Intel® Xeon® Scalable Processors and the Intel® Distribution of OpenVINO™ Toolkit. Today, data scientists who are building cutting-edge AI algorithms rely very heavily on Intel® Distribution for Python to get higher performance gains. While stock Python products bring a great deal performance to the table, the Intel performance libraries that come already plugged in with Intel® Distribution for Python help programmes gain more significant speed-ups as compared to the open source scikit-learn. Now, those working in distributed environments leverage BigDL, a DL library for Apache Spark. This distributed DL library helps data scientists accelerate DL inference on CPUs in their Spark environment. “BigDL is an add-on to the machine learning pipeline and delivers an incredible amount of performance gains,” Bilani elaborates. Then there’s also Intel® Data Analytics Acceleration Library (Intel® DAAL), widely used by data scientists for its range of algorithms, ranging from the most basic descriptive statistics for datasets to more advanced data mining and machine learning algorithms. For every stage in the development pipeline, there are tools providing APIs and it can be used with other popular data platforms such as Hadoop, Matlab, Spark and R. There is also another audience that Intel caters to — the tuning experts who really understand their programs and want to get the maximum performance out of their architecture. For these users, the company offers its Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN) — an open source, performance-enhancing library which has been abstracted to a great extent to allow developers to utilise DL frameworks featuring optimized performance on Intel hardware. This platform can accelerate DL frameworks on Intel architecture and developers can also learn more about this tool through tutorials. The developer community is also excited about yet another ambitious undertaking from Intel, which will soon be out in beta and that truly takes away the complexity brought on by heterogeneous architectures. OneAPI, one of the most ground-breaking multi-year software projects from Intel, offers a single programming methodology across heterogeneous architectures. The end benefit to application developers is that they need no longer maintain separate code bases, multiple programming languages, and different tools and workflows which means that they can now get maximum performance out of their hardware. As Prakash Mallya, Vice President and Managing Director, Sales and Marketing Group, Intel India, explains, “The magic of OneAPI is that it takes away the complexity of the programme and developers can take advantage of the heterogeneity of architectures which implies they can use the architecture that best fits their usage model or use case. It is an ambitious multi-year project and we are committed to working through it every single day to ensure we simplify and not compromise our performance.” According to Bilani, the bottomline of leveraging OneAPI is that it provides an abstracted, unified programming language that actually delivers a one view/OneAPI across all the various architectures. OneAPI will be out in beta in October. How Intel Is Reimagining Computing As architectures get more diverse, Intel is doubling down on a broader roadmap for domain-specific architectures coupled with simplified software tools (libraries and frameworks) that enable abstraction and faster prototyping across its comprehensive AI solutions stack. The company is also scaling adoption of its hardware assets — CPUs, FPGAs, VPUs and the soon to be released Intel Nervana™ Neural Network Processor product line. As Mallya puts it, “Hardware is foundational to our company. We have been building architectures for the last 50 years and we are committed to doing that in the future but if there is one thing I would like to reinforce, it is that in an AI-driven world, as data-centric workloads become more diverse, there’s no single architecture that can fit in.” That’s why Intel focuses on multiple architectures — whether it is scalar (CPU), vector (GPU), matrix (AI) or spatial (FPGA). The Intel team is working towards offering more synchrony between all the hardware layers and software. For example, Intel Xeon Scalable processors have undergone generational improvements and are now seeing a drift towards instructions which are very specific to AI. Vector Neural Network Instruction (VNNI), built into the 2nd Generation Intel Xeon Scalable processors, delivers enhanced AI performance. Advanced Vector Extensions (AVX), on the other hand, are instructions that have already been a part of Intel Xeon technology for the last five years. While AVX allows engineers to get the performance they need on a Xeon processor, VNNI enables data scientists and machine learning engineers to maximize AI performance. Here’s where Intel is upping the game in terms of heterogeneity — from generic CPUs (2nd Gen Intel Xeon Scalable processors) running specific instructions for AI to actually having a complete product built for both training and inference. Earlier in August at the Hot Chips 2019, Intel announced the Intel Nervana Neural Network processors4, designed from the ground up to run full AI workloads that cannot run on GPUs which are more general purpose. The Bottomline: a) Deploy AI anywhere with unprecedented hardware choice b) Software capabilities that sit on top of hardware c) Enriching community support to get up to speed with the latest tools Winning the AI Race For Intel, the winning factor has been staying closely aligned with its strategy of ‘no one size fits all’ approach and ensuring its evolving portfolio of solutions and products stays AI-relevant. The technology behemoth has been at the forefront of the AI revolution, helping enterprises and startups operationalize AI by reimagining computing and offering full-stack AI solutions, spanning software and hardware that add additional value to end customers. Intel has also heavily built up a complete ecosystem of partnerships and has made significant inroads into specific industry verticals and applications like telecom, healthcare and retail, helping the company drive long-term growth. As Mallya sums up, the way forward is through meaningful collaborations and making the vision of AI for India a reality using powerful best-in-class tools. Sources 1AI Hardware Summit: https://twitter.com/karlfreund 2Intel Software Solutions: https://software.intel.com/en-us 3Accelerate Vision Anywhere With OpenVINO™ Toolkit: https://www.intel.in/content/www/in/en/internet-of-things/openvino-toolkit.html 4At Hot Chips, Intel Pushes ‘AI Everywhere’: https://newsroom.intel.com/news/hot-chips-2019/#gs.8w7pme Read the full article

#intel #intelai #intelartificialintelligence

0 notes

toldnews-blog · 6 years

Photo

New Post has been published on https://toldnews.com/business/virtual-cities-designing-the-metropolises-of-the-future/

Virtual cities: Designing the metropolises of the future

Image copyright Getty Images

Image caption The cities of the future will be informed by data as much as by design

Simulation software that can create accurate “digital twins” of entire cities is enabling planners, designers and engineers to improve their designs and measure the effect changes will have on the lives of citizens.

Cities are hugely complex and dynamic creations. They live and breathe.

Think about all the parts: millions of people, schools, offices, shops, parks, utilities, hospitals, homes and transport systems.

Changing one aspect affects many others. Which is why planning is such a hard job.

So imagine having a tool at your disposal that could answer questions such as “What will happen to pedestrian and traffic flow if we put the new metro station here?” or “How can we persuade more people to leave their cars at home when they go to work?”

This is where 3D simulation software is coming into its own.

Architects, engineers, construction companies and city planners have long used computer-aided design and building information modelling software to help them create, plan and construct their projects.

But with the addition of internet of things (IoT) sensors, big data and cloud computing, they can now create “digital twins” of entire cities and simulate how things will look and behave in a wide range of scenarios.

“A digital twin is a virtual representation of physical buildings and assets but connected to all the data and information around those assets, so that machine learning and AI algorithms can be applied to them to help them operate more efficiently,” explains Michael Jansen, chief executive of Cityzenith, the firm behind the Smart World Pro simulation platform.

Take Singapore as an example.

Image copyright NRF Singapore

Image caption The real Singapore has been faithfully recreated in virtual form

Image copyright NRF Singapore

Image caption Planners now have a data-rich simulation of the city to interact with

This island state, sitting at the foot of the Malaysian peninsula with a population of six million people, has developed a virtual digital twin of the entire city using software developed by French firm Dassault Systemes.

“Virtual Singapore is a 3D digital twin of Singapore built on topographical as well as real-time, dynamic data,” explains George Loh, progammes director for the city’s National Research Foundation (NRF), a department within the prime minister’s office.

“It will be the country’s authoritative platform that can be used by urban planners to simulate the testing of innovative solutions in a virtual environment.”

In addition to the usual map and terrain data, the platform incorporates real-time traffic, demographic and climate information, says Mr Loh, giving planners the ability to engage in “virtual experimentation”.

“For example, we can plan barrier-free routes for disabled and elderly people,” he says.

Bernard Charles, Dassault Systemes’ chief executive, says the addition of real-time data from multiple sources facilitates joined-up, holistic thinking.

Image copyright NRF Singapore

Image caption The city envisages Virtual Singapore being used by citizens to locate driverless cars for hire

“The problem is that when we decide about the evolution of a city we are in some way blind. You have the urban view of it – a map – you decide to put a building here, but another agency has to think about transport, another agency has to think about commercial use and flats for people.

“The creation of one thing changes so many other things – the flow and life of citizens.”

The firm’s 3DExperience platform gives planners and designers “a global overview” they’ve never had before, explains Mr Charles.

Dassault’s software, which incorporates calculations that simulate the flow of a fluid, is used to design most F1 cars and aeroplanes, says Mr Charles, and this capability is useful for understanding wind flow around buildings, through streets and green spaces.

Image copyright NRF

Image caption The software can model wind flow through built up areas

“If some parts of a city are too windy and cold, no-one will like to go there,” he says.

Tracking people’s movements through a city using anonymised mobile phone and transport GPS data can help authorities spot bottlenecks and heat maps as the day progresses, hopefully leading to smarter, more integrated transport and traffic management systems.

“You can look at all ‘what if’ scenarios, so if we ask the right question we can change the city, the world,” concludes Mr Charles.

Is India failing to build its newest state capital?

In the state of Andhra Pradesh in India, a brand new $6.5bn “smart city” called Amaravati has been planned since 2015, but has been mired in controversy amid disagreements over the designs and criticism of its environmental impact.

But last year Foster + Partners, the global architecture and engineering firm, and Surbana Jurong, the Asian urban and infrastructure consultancy, were chosen to take on the huge task.

And Chicago-based Cityzenith is providing the single “command and control” digital platform for the entire project.

Image copyright Cityzenith

Image caption Cityzenith’s Smart World Pro platform gives a real-time simulation of the entire Amaravati city project

IoT sensors will monitor construction progress in real time, says Mr Jansen, and the software will integrate all the designs from the 30 or so design consultants already involved in the first phase of the project.

“The portal will simulate the impact of these proposed buildings before anyone even breaks ground,” he says, “and these simulations will adjust to real-time changes.”

The platform can incorporate more than a thousand datasets, says Mr Jansen, and integrate all the various design and planning tools the designers and contractors use.

The city, which will eventually be home to 3.5 million people, will be hot and humid, experiencing temperatures approaching 50C at times, so simulating how buildings will cope with the climate will be crucial, says Mr Jansen.

More Technology of Business

One large Norwegian engineering consultancy, Norconsult, is even combining simulation software with gaming to help improve its designs.

When working on a large rail tunnel project in Norway, the firm developed a virtual reality game to involve train drivers in the design of the signalling system. The drivers operated a virtual train and “drove” it through the tunnel, flagging up any issues with the proposed position of the signals.

Image copyright Norconsult

Image caption Train drivers “drove” a virtual train through the tunnel to test the positions of the signals

“They could change weather conditions, the speed and so on,” says Thomas Angeltveit, who worked on the project. “It feels real, so it is much easier for them to interact.”

“We had a lot of comments, so we were able to change the design and make a lot of adjustments.”

Changing the design before construction begins obviously saves money in the long-term.

Digital twin simulation software is a fast-growing business, with firms such as Siemens, Microsoft and GE joining Dassault Systemes and Cityzenith as lead practitioners.

Research firm Gartner predicts that by 2021 half of large industrial companies will use digital twins and estimates that those that do could save up to 25% in operational running costs as a result.

The future of design is virtual and driven by data it seems.

Follow Matthew on Twitter and Facebook

0 notes

ladystylestores · 4 years

Text

A Silicon Valley for everyone – TechCrunch

Editor’s note: Get this free weekly recap of TechCrunch news that any startup can use by email every Saturday morning (7am PT). Subscribe here.

Many in the tech industry saw the threat of the novel coronavirus early and reacted correctly. Fewer have seemed prepared for its aftereffects, like the outflow of talented employees from very pricey office real estate in expensive and troubled cities like San Francisco.

And few indeed have seemed prepared for the Black Lives Matter protests that have followed the death of George Floyd. This was maybe the easiest to see coming, though, given how visible the structural racism is in cities up and down the main corridors of Silicon Valley.

Today, the combination of politics, the pandemic and the protests feels almost like a market crash for the industry (except many revenues keep going up and to the right). Most every company is now fundamentally reconsidering where it will be located and who it will be hiring — no matter how well it is doing otherwise.

Some, like Google and Thumbtack, have been caught in the awkward position of scaling back diversity efforts as part of pandemic cuts right before making statements in support of the protesters, as Megan Rose Dickey covered on TechCrunch this week. But it is also the pandemic helping to create the focus, as Arlan Hamilton of Backstage Capital tells her:

It is like the world and the country has a front-row seat to what Black people have to witness, take in, and feel all the time. And it was before they were seeing some of it, but they were seeing it kind of protected by us. We were kind of shielding them from some of it… It’s like a VR headset that the country is forced to be in because of COVID. It’s just in their face.

This also putting new scrutiny on how tech is used in policing today. It is renewing questions around who gets to be a VC and who gets funding right when the industry is under new pressure to deliver. It is highlighting solutions that companies can make internally, like this list from BLCK VC on Extra Crunch.

As with police reforms currently in the national debate, some of the most promising solutions are local. Property tax reform, pro-housing activism and sustainable funding for homelessness services are direct ways for the tech industry to address the long history of discrimination where the modern tech industry began, Catherine Bracy of TechEquity writes for TechCrunch. These changes are also what many think would make the Bay Area a more livable place for everyone, including any startup and any tech employee at any tech company (see: How Burrowing Owls Lead To Vomiting Anarchists).

Something to think about as we move on to our next topic — the ongoing wave of tech departures from SF.

Where will VCs follow founders to now?

In this week’s staff survey, we revisit the remote-first dislocation of the tech industry’s core hubs. Danny Crichton observes some of the places that VCs have been leaving town for, and thinks it means bigger changes are underway:

“Are VCs leaving San Francisco? Based on everything I have heard: yes. They are leaving for Napa, leaving for Tahoe, and otherwise heading out to wherever gorgeous outdoor beauty exists in California. That bodes ill for San Francisco’s (and really, South Park’s) future as the oasis of VC.

But the centripetal forces are strong. VCs will congregate again somewhere else, because they continue to have that same need for market intelligence that they have always had. The new, new place might not be San Francisco, but I would be shocked just given the human migration pattern underway that it isn’t in some outlying part of the Bay Area.

And then he says this:

As for VCs — if the new central node is a bar in Napa and that’s the new “place to be” — that could be relatively more permanent. Yet ultimately, VCs follow the founders even if it takes time for them to recognize the new balance of power. It took years for most VCs to recognize that founders didn’t want to work in South Bay, but now nearly every venture firm of note has an office in San Francisco. Where the founders go, the VCs will follow. If that continues to be SF, its future as a startup hub will continue after a brief hiatus.

It’s true that another outlying farming community in the region once became a startup hub, but that one had a major research university next door, and at the time a lot of cheap housing if you were allowed access to it. But Napa cannot be the next Palo Alto because it is fully formed today as a glorified retirement community, Danny.

I’m already on the record for saying that college towns in general are going to become more prominent in the tech world, between ongoing funding for innovative tech work and ongoing desirability for anyone moving from the big cities. But I’m going to add a side bet that cities will come back into fashion with the sorts of startup founders that VCs would like to back. As Exhibit A, I’d like to present Jack Dorsey, who started a courier dispatch in Oakland in 2000, and studied fashion and massage therapy during the aftermath of the dot-com bubble. His success with Twitter a few years later in San Francisco inspired many founders to move as well.

Creative people like him are drawn to the big, creative environments that cities can offer, regardless of what the business establishment thinks. If the public and private sectors can learn from the many mistakes of recent decades (see last item) who knows, maybe we’ll see a more equal and resilient sort of boom emerge in tech’s current core.

Insurance provider Lemonade files for IPO with that refreshing common-stock flavor

There are probably some amazing puns to be made here but it has been a long week, and the numbers speak for themselves. Lemonade sells insurance to renters and homeowners online, and managed to reach a private valuation of $3.5 billion before filing to go public on Monday — with the common stockholders still comprising the majority of the cap table.

Danny crunched the numbers from the S-1 on Extra Crunch to generate the table, included, that illustrates this rather unusual breakdown. Usually, as you almost certainly know already, the investors own well over half by the time of a good liquidity event. “So what was the magic with Lemonade?” he ponders. “One piece of the puzzle is that company founder Daniel Schreiber was a multi-time operator, having previously built Powermat Technologies as the company’s president. The other piece is that Lemonade is built in the insurance market, which can be carefully modeled financially and gives investors a rare repeatable business model to evaluate.”

(Photo by Paul Hennessy/NurPhoto via Getty Images)

Adapting enterprise product roadmaps to the pandemic

Our investor surveys for Extra Crunch this week covered the space industry’s startup opportunities, and looked at how enterprise investors are assessing the impact of the pandemic. Here’s Theresia Gouw of Acrew Capital, explaining how two of their portfolio companies have refocused in recent months:

A common theme we found when joining our founders for these strategy sessions was that many pulled forward and prioritized mid- to long-term projects where the product features might better fit the needs of their customers during these times. One such example in our portfolio is Petabyte’s (whose product is called Rhapsody) accelerated development of its software capabilities that enable veterinarians to provide telehealth services. Rhapsody has also incorporated key features that enable a contactless experience when telehealth isn’t sufficient. These include functionality that enables customers to check-in (virtual waiting room), sign documents, and make payments from the comfort and safety of their car when bringing their pet (the patient!) to the vet for an in-person check-up.

Another such example would be PredictHQ, which provides demand intelligence to enterprises in travel, hospitality, logistics, CPG, and retail, all sectors who saw significant change (either positive or negative) in the demand for their products and services. PredictHQ has the most robust global dataset on real-world events. Pandemics and all the ensuing restrictions and, then, loosening of restrictions fall within the category of real-world events. The company, which also has multiple global offices, was able to incorporate the dynamic COVID government responses on a hyperlocal basis, by geography, and equip its customers (e.g., Domino’s, Qantas, and First Data) with up to date insights that would help with demand planning and forecasting as well as understanding staffing needs.

Around TechCrunch

Extra Crunch Live: Join Superhuman CEO Rahul Vohra for a live Q&A on June 16 at 2pm EDT/11 AM PDT Join us for a live Q&A with Plaid CEO Zach Perret June 18 at 10 a.m. PDT/1 p.m. EDT Two weeks left to save on TC Early Stage passes Learn how to ‘nail it before you scale it’ with Floodgate’s Ann Miura-Ko at TC Early Stage SF How can startups reinvent real estate? Learn how at TechCrunch Disrupt Stand out from the crowd: Apply to TC Top Picks at Disrupt 2020

Across the Week

TechCrunch

Theaters are ready to reopen, but is America ready to go back to the movies? Edtech is surging, and parents have some notes When it comes to social media moderation, reach matters Zoom admits to shutting down activist accounts at the request of the Chinese government

Extra Crunch

TechCrunch’s top 10 picks from Techstars’ May virtual demo days Software’s meteoric rise: Have VCs gone too far? Recession-proof your software engineering career The complicated calculus of taking Facebook’s venture money The pace of startup layoffs may be slowing down

#EquityPod

From Alex:

Hello and welcome back to Equity, TechCrunch’s venture capital-focused podcast, where we unpack the numbers behind the headlines.

After a pretty busy week on the show we’re here with our regular Friday episode, which means lots of venture rounds and new venture capital funds to dig into. Thankfully we had our full contingent on hand: Danny “Well, you see” Crichton, Natasha “Talk to me post-pandemic” Mascarenhas, Alex “Very shouty” Wilhelm and, behind the scenes, Chris “The Dad” Gates.

Make sure to check out our IPO-focused Equity Shot from earlier this week if you haven’t yet, and let’s get into today’s topics:

Instacart raises $225 million. This round, not unexpected, values the on-demand grocery delivery startup at $13.7 billion — a huge sum, and one that should make it harder for the well-known company to sell itself to anyone but the public markets. Regardless, COVID-19 gave this company a huge updraft, and it capitalized on it.

Pando raises $8.5 million. We often cover rounds on Equity that are a little obvious. SaaS, that sort of thing. Pando is not that. Instead, it’s a company that wants to let small groups of individual pool their upside and allow for more equal outcomes in an economy that rewards outsized success.

Ethena raises $2 million. Anti-harassment software is about as much fun as the dentist today, but perhaps that doesn’t have to be the case. Natasha talked us through the company, and its pricing. I’m pretty bullish on Ethena, frankly. Homebrew, Village Global and GSV took part in the financing event.

Vendr raises $4 million. Vendr wants to help companies cut their SaaS bills, through its own SaaS-esque product. I tried to explain this, but may have butchered it a bit. It’s cool, I promise.

Facebook is getting into the CVC game. This should not be a surprise, but we were also not sure who was going to want Facebook money.

And, finally, Collab Capital is raising a $50 million fund to invest in Black founders. Per our reporting, the company is on track to close on $10 million in August. How fast the fund can close its full target is something we’re going to keep an eye on, considering it might get a lot harder a lot sooner.

And that is that; thanks for lending us your ears.

Equity drops every Friday at 6:00 am PT, so subscribe to us on Apple Podcasts, Overcast, Spotify and all the casts.

Source link

قالب وردپرس

from World Wide News https://ift.tt/2XXCWXM

#World Wide News

0 notes

hudsonespie · 5 years

Text

Study: Around the World, Sustainable Fisheries Management is Working

Fisheries around the world are in better health than most people realize, according to a new study published last month in the journal PNAS. The study is the latest comprehensive health assessment of the world’s fish populations, and the data paints an improving picture, with many fisheries now able to provide a sustainable catch.

“There is a narrative that fish stocks are declining around the world, that fisheries management is failing, and we need new solutions. And it’s totally wrong,” said lead author Ray Hilborn, a fisheries expert at the University of Washington, who led the study. “Fish stocks are not all declining around the world. They are increasing in many places, and we already know how to solve problems through effective fisheries management.”

Key fishing grounds in Europe, South America and Africa are among those found to have healthy or improving numbers. But the good news has limits. The status of many unmanaged fisheries, especially those in South and Southeast Asia, are unclear. As global trade continues to increase demand, these regions are most likely being overexploited.

Compiled by fisheries scientists from around the world, the new analysis looked at data on 882 fish stocks, including information available for the first time about catches from Peru, Chile, Japan, Russia, north-west Africa and the Mediterranean and Black seas. The researchers then compared this to details of fisheries management in about 30 countries. They found that more intense management led to healthy or improving fish populations, while little to no management led to overfishing.

The study concludes: “The efforts of the thousands of managers, scientists, fishers and non-governmental organization workers have resulted in significantly improved statuses of fisheries in much of the developed world, and increasingly in the developing world.”

The study shows something else too: consensus and cooperation continues between two distinct camps of fisheries experts previously in conflict. The two sides – who disagreed on the health and likely future prospects of global fish populations – first combined to offer a joint assessment a decade ago. That 2009 analysis concluded many depleted fisheries were making good progress towards recovery. But the data used only covered about 20% of the world’s catch. In other words, the status of 80 percent of the fish landed every year across the globe remained a mystery.

Last week’s study is put together by a similar team of researchers and significantly extends the dataset, which now contains information on about half the world’s catch. The results, Hilborn says, show that consumers in the developed world – including in North America and Britain – can now buy many fish species with a clear conscience. “If you want to be very careful, you need to look at exactly what species it is,” he says. “But as a general rule, particularly those of us in the West, we’re largely eating fish that come from well-managed fisheries.”

There are some important exceptions. For example, shrimp is the most popular seafood in the US, and the majority is imported from unmanaged fisheries in Southeast Asia.

“Many of the countries that have made progress domestically still import from countries where the situation isn’t as nice. That is something else we should be conscious of,” says Beth Fulton, a marine scientist with the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Hobart, Australia, who was not involved with the new study.

She adds: “Serious effort has to go into helping nations which do not currently have significant fisheries management capacity to tackle the issues they face, which go beyond a lack of resources.”

Many important fisheries are not included in the new dataset, sometimes because dozens of different species of fish are caught at the same time. That type of fishing activity is more difficult to track as management schemes typically focus on fisheries where a single species is targeted, such as cod or tuna.

Hilborn says: “The unassessed fisheries are largely highly mixed fisheries. They may catch a hundred species in one haul of the net, and you can’t regulate those on a species-by-species basis. So the toolkit for managing those fisheries is going to be different than what we dominantly use in the successes we’ve had so far.”

Unassessed fisheries in India, Indonesia and China represent 30-40% of the world’s fish catch. “China is a big black box. It’s the biggest fishing country in the world. And they have essentially no publicly available assessments of their resources,” Hilborn says.

Steve Palumbi, a fisheries scientist at Stanford University, says some caution is also needed with the data where they do exist. “I’m not as convinced that this shows the universal success of fisheries management schemes,” he says. “Because regional data from the same countries – mostly the US and Canada – show different patterns.” East coast fisheries in both countries have not responded well, whereas west coast fisheries, and Alaska have done better. “It may well be that there has not been enough time for the effect of management to take hold in the eastern fisheries, perhaps because they were so far down to begin with,” he says.

Reg Watson, a marine researcher at the University of Tasmania, says scientists tend to think about fisheries in two distinct ways. “One tries to save the oceans and all its life from the destruction of fishing. While the other tries to focus on the stocks that feed us and provide jobs and support to the millions around the world,” he says. “The typical uncertainty associated with grand assessments of the world’s ocean life leave room for both.”

Focusing on fish stocks might show that a fishery can provide a sustainable supply, he says, but such data don’t necessarily offer a true picture of the health of a marine ecosystem – “like our terrestrial systems they have likely been greatly simplified and now lack much of the diversity and resilience they once had.” Watson adds: “This could be very important in the near future.”

David Adam is a freelance journalist based near London. This article appears courtesy of China Dialogue Ocean and may be found in its original form here.

from Storage Containers https://maritime-executive.com/article/study-around-the-world-sustainable-fisheries-management-is-working via http://www.rssmix.com/

0 notes

shuga-hill · 5 years

Link

Making Algorithms More Like Kids: What Can Four-Year-Olds Do That AI Can’t?

Thomas Hornigold

Jun 26, 2019

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.

Alan Turing famously wrote this in his groundbreaking 1950 paper Computing Machinery and Intelligence, and laid the framework for generations of machine learning scientists to follow. Yet, despite increasingly impressive specialized applications and breathless predictions, we’re still some distance from programs that can simulate any mind, even one much less complex than a human’s.

Perhaps the key came in what Turing said next: “Our hope is that there is so little mechanism in the child brain that something like it can be easily programmed.” This seems, in hindsight, naive. Moravec’s paradox applies: things that seem like the height of human intellect, like a good stimulating game of chess, are easy for machines, while simple tasks can be extremely difficult. But if children are our template for the simplest general human-level intelligence we might program, then surely it makes sense for AI researchers to study the many millions of existing examples.

This is precisely what Professor Alison Gopnik and her team at Berkeley do. They seek to answer the question: how sophisticated are children as learners? Where are children still outperforming the best algorithms, and how do they do it?

General, Unsupervised Learning

Some of the answers were outlined in a recent talk at the International Conference on Machine Learning. The first and most obvious difference between four-year-olds and our best algorithms is that children are extremely good at generalizing from a small set of examples. ML algorithms are the opposite: they can extract structure from huge datasets that no human could ever process, but generally large amounts of training data are needed for good performance.

This training data usually has to be labeled, although unsupervised learning approaches are also making progress. In other words, there is often a strong “supervisory signal” coded into the algorithm and its dataset, consistently reinforcing the algorithm as it improves. Children can learn to perform generally on a wide variety of tasks with very little supervision, and they can generalize what they’ve learned to new situations they’ve never seen before.

Even in image recognition, where ML has made great strides, algorithms require a large set of images before they can confidently distinguish objects; children may only need one. How is this achieved?

Professor Gopnik and others argue that children have “abstract generative models” that explain how the world works. In other words, children have imagination: they can ask themselves abstract questions like “If I touch this sharp pin, what will happen?” And then, from very small datasets and experiences, they can anticipate the solution.

In doing so, they are correctly inferring the relationship between cause and effect from experience. Children know that the reason that this object will prick them unless handled with care is because it’s pointy, and not because it’s silver or because they found it in the kitchen. This may sound like common sense, but being able to make this kind of causal inference from small datasets is still hard for algorithms to do, especially across such a wide range of situations.

The Power of Imagination

Generative models are increasingly being employed by AI researchers—after all, the best way to show that you understand the structure and rules of a dataset is to produce examples that obey those rules. Such neural networks can compress hundreds of gigabytes of image data into hundreds of megabytes of statistical parameter weights and learn to produce images that look like the dataset. In this way, they “learn” something of the statistics of how the world works. But to do what children can and generalize with generative models is computationally infeasible, according to Gopnik.

This is far from the only trick children have up their sleeve which machine learning hopes to copy. Experiments from Professor Gopnik’s lab show that children have well-developed Bayesian reasoning abilities. Bayes’ theorem is all about assimilating new information into your assessment of what is likely to be true based on your prior knowledge. For example, finding an unfamiliar pair of underwear in your partner’s car might be a worrying sign—but if you know that they work in dry-cleaning and use the car to transport lost clothes, you might be less concerned.

Scientists at Berkeley present children with logical puzzles, such as machines that can be activated by placing different types of blocks or complicated toys that require a certain sequence of actions to light up and make music.

When they are given several examples (such as a small dataset of demonstrations of the toy), they can often infer the rules behind how the new system works from the age of three or four. These are Bayesian problems: the children efficiently assimilate the new information to help them understand the universal rules behind the toys. When the system isn’t explained, the children’s inherent curiosity leads them to experimenting with these systems—testing different combinations of actions and blocks—to quickly infer the rules behind how they work.

Indeed, it’s the curiosity of children that actually allows them to outperform adults in certain circumstances. When an incentive structure is introduced—i.e. “points” that can be gained and lost depending on your actions—adults tend to become conservative and risk-averse. Children are more concerned with understanding how the system works, and hence deploy riskier strategies. Curiosity may kill the cat, but in the right situation, it can allow children to win the game by identifying rules that adults miss by avoiding any action that might result in punishment.

To Explore or to Exploit?

This research shows not only the innate intelligence of children, but also touches on classic problems in algorithm design. The explore-exploit problem is well known in machine learning. Put simply, if you only have a certain amount of resources-time, computational ability, etc.—are you better off searching for new strategies, or simply taking the path that seems to most obviously lead to gains?

Children favor exploration over exploitation. This is how they learn—through play and experimentation with their surroundings, through keen observation and asking as many questions as they can. Children are social learners: as well as interacting with their environment, they learn from others. Anyone who has ever had to deal with a toddler endlessly using that favorite word, “why?”, will recognize this as a feature of how children learn! As we get older—kicking in around adolescence in Gopnik’s experiments—we switch to exploiting the strategies we’ve already learned rather than taking those risks.

These concepts are already being imitated in machine learning algorithms. One example is the idea of “temperature” for algorithms that look through possible solutions to a problem to find the best one. A high-temperature search is more likely to pick a random move that might initially take you further away from the reward. This means that the optimization is less likely to get “stuck” on a particular solution that’s hard to improve upon, but may not be the best out there—but it’s also slower to find a solution. Meanwhile, searches with lower temperature take fewer “risky” random moves and instead seek to refine what’s already been found.

In many ways, humans develop in the same way, from high-temperature toddlers who bounce around playing with new ideas and new solutions even when they seem strange to low-temperature adults who take fewer risks, are more methodical, but also less creative. This is how we try to program our machine learning algorithms to behave as well.

It’s nearly 70 years since Turing first suggested that we could create a general intelligence by simulating the mind of a child. The children he looked to for inspiration in 1950 are all knocking on the door of old age today. Yet, for all that machine learning and child psychology have developed over the years, there’s still a great deal that we don’t understand about how children can be such flexible, adaptive, and effective learners.

Understanding the learning process and the minds of children may help us to build better algorithms, but it could also help us to teach and nurture better and happier humans. Ultimately, isn’t that what technological progress is supposed to be about?

#artificial intelligence

0 notes

onlinemarketingcourses · 5 years

Text

Using Python to recover SEO site traffic (Part three) Search Engine Watch

When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.

As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.

Let’s get started.

URL matching vs content matching

Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.

How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.

A scientist’s bottom-up approach

One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.

Hamlet’s observation and a simpler solution

For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.

Now let’s get to the fun part and get to code some machine learning code in Python!

Collecting training data

In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.

What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).

Here we load the raw data already collected.

Feature engineering

Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.

Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.

Here is what our processed data set looks like.

Adding ground truth labels

As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.

Model training and grid search

Finally, the good stuff!

All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.

Evaluating performance

In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.

Here is the code to put together the confusion matrix:

Finally, here is the code to plot the model evaluation:

Resources to learn more

You might be thinking that this is a lot of work to just tell page groups, and you are right!

If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:

Got any tips or queries? Share it in the comments.

Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter .

Want to stay on top of the latest search trends?

Get top insights and news from our search experts.