dataworkblog - Tumblr blog

dataworkblog · 11 years ago

Text

How 3D Printing Will Impact Our Future: A Rundown of Companies to Keep Your Eyes On

By Rude de Waele

The first time I saw a 3D-printer in action was when I participated to the Singularity University Executive Program in the spring of 2011. It was a place that offered corporate executives and entrepreneurs the tools to predict and evaluate how emerging technologies will disrupt and transform their industries, companies, careers and lives.

Since then, I have been following the explosion of 3D-printing products and services closely and it’s an integral part in most of my talks for clients and at conferences.

During the program we visited TechShop; there, we experimented with miniature 3D modeling, as well as the Autodesk offices in San Francisco. Those visits really blew my mind as I realized the broad possibilities of use and the impact 3D printing could have in many different sectors. It was incredible to see last week at the 3D-print show in London how this industry has grown in just three years’ time.

10 notes · View notes

dataworkblog · 11 years ago

Text

Everything Apple Announced At Its September 2014 Keynote

By Nat Garun

After months of anticipation, the iPhone 6 is finally official… along with a bunch of other stuff! Here’s a recap of the highlights from Apple’s event today in Cupertino in case you were having some technical problems with the livestream. For full details, check out the link(s) below each highlight.

2 notes · View notes

dataworkblog · 11 years ago

Text

Why Big Data Has Some Problems When it Comes to Public Policy

By Derrick Harris

For all the talk about using big data and data science to solve the world’s problems — and even all the talk about big data as one of the world’s problems — it seems like we still have a long way to go.

Earlier this week, an annual conference on data mining, KDD 2014 for short, took place in New York with the stated goal of highlighting “data science for social good.” It’s a noble goal and, indeed, the event actually did highlight a lot of research and even some real-world examples of how data can help solve various problems in areas ranging from health care to urban planning. But the event also highlighted — to me, at least — some very real obstacles that stand in the way of using data science to solve society’s problems in any meaningful way at any meaningful scale.

1 note · View note

dataworkblog · 11 years ago

Text

Citizen’s Drone Video Shows Damage of Napa Earthquake

By Jeff John Roberts

A Napa resident used his drone to capture the fallout from the 6.9-magnitude earthquake that rattled the California town on Sunday, and posted scenes of the damage on YouTube.

1 note · View note

dataworkblog · 11 years ago

Text

Instagram Launches Advertising Analytics

By Selena Larson

Instagram is making it possible for businesses to find out just how well their advertising performs.

On Thursday, the company rolled out a suite of business tools to manage ad campaigns. Account insights and ad insights display impressions, reach, and engagement, both for particular ad campaigns and the account itself. An advertising staging feature enables advertisers to edit and preview campaigns before launching.

Instagram worked with a handful of advertisers before giving analytics to all advertisers this week. The company will make the new tools available to all brands later this year.

Instagram launched advertising last fall.

2 notes · View notes

dataworkblog · 11 years ago

Text

Eight Charts That Put Tech Companies’ Diversity Stats Into Perspective

By Carmel DiAmicis

The latest hot-button subject in tech, hotter even than ephemeral apps, is diversity. Or at least, if not actual diversity, the act of releasing employee diversity statistics. From Apple to Twitter, almost all the big names in Silicon Valley are doing it. Google fell first in May, and with some pushing by activist organizations the rest soon followed suit.

We’ve broken down some of the top players – Apple, Twitter, Pinterest, Facebook, Google, Yahoo, Microsoft, eBay, LinkedIn, Cisco, Intel, and HP – comparing their overall gender and ethnicity demographics. Then we went a step further to look specifically at the tech and leadership roles. Where relevant, we also charted the demographic information of the U.S. labor force and the graduating computer science class.

2 notes · View notes

dataworkblog · 11 years ago

Text

Just How Creepy Can Targeted Ads Get? New Tool Promises to Tell You

By Selena Larson

Ever find yourself scrolling through a website and seeing an advertisement that’s a little too well-targeted? You know, as if the advertiser knew you recently twisted your ankle and need to buy some sturdier shoes?

Columbia University researchers are working on XRay, a tool to help innocent Internet users make sense of those ads that stalk us, sometimes in ways that are worse than creepy.

#Data #targeted advertising

2 notes · View notes

dataworkblog · 11 years ago

Text

DataGravity Says It’s Time For Your Storage to Smarten Up Already

By Barb Darrow

DataGravity, the thus-far secretive startup co-founded by Paula Long of EqualLogic fame, is finally ready to talk about its DataGravity Discovery storage array.

Aggregating data about the data

What are some examples of that important information? For instance: Who at the company accessed a file and how often? Who is working together on shared files? Is there personally identifiable information (PII) or credit card information sitting in documents? Which files have not been touched in two years? All of that is really interesting data about that stored data — and it can be used for compliance and governance purposes, Long said in a recent interview.

The idea is to catalog and expose that data so it can be of use to admins or execs, and do all of that in the array without needing a lot of add-on software products.

“We’ve integrated data analytics into storage — as data is ingested we capture who’s reading and writing it using Active Directory or LDAP. We capture who’s interacting with the data on the front end, we provide audit and activity trail, and on the backend we index over 400 data types,” Long said.

That can lead to some interesting “aha!” moments. One beta tester found a termination letter addressed to him, Long said. His panic subsided when he realized that his agency automatically generated such letters when employees are reassigned to other departments.

Uncovering insights

Another beta tester, Mark Lamson, director of IT for the Westerly Public Schools in Westerly, Rhode Island, said DataGravity pulled attendance data from various files and the school system was able to find a student with a perfect attendance record. “Instead of it being a gotcha thing about truancy, we found something positive to celebrate,” he said.

Chris Berube, IT manager for the Law Office of Joe Bornstein, a Portland, Maine-based law firm, said that while moving some data, a file popped up with a company credit card number in it. That’s the kind of thing that can cause problems.

The data about the data can help admins delegate unused files for archival. A planned release will offer that off-boarding capability to AWS Glacier or other inexpensive archives. The company may also add an OCR capability that will scan PDF documents and make them searchable as well

The company adapted open-source search technology with its own secret sauce to provide Google-like search capability. The data indexing takes place on the array’s secondary spindle. The new DataGravity Discovery arrays will be on display next week at VMworld 2014.

Long has lots of fans in the tech community. EqualLogic blazed the trail for iSCSI storage when it launched in 2001. It was headed for an IPO when Dell bought it in 2008 for about $1.4 billion. Nashua, New Hampshire–based DataGravity has raised about $42 million in venture funding from Andreessen Horowitz, Charles River Partners and others.

To hear more from Paula Long about storage and other trends in tech infrastructure, check out this video of her panel at Structure 2014.

#datagravity #Data #storage #technology #analytics

0 notes

dataworkblog · 11 years ago

Text

Teenager Could Revolutionize Malaria Testing with a Cheap Smartphone Attachment

By Pavel Curda

Tanay Tandon is the 17-year-old founder of Athelas, a blood-testing kit for smartphones that is designed to diagnose malaria.

In short – a malaria test that requires no expertise, takes a few seconds, and costs next to nothing. All on a smartphone – holding potential to save thousands of lives.

We first reported on Athelas earlier this month when it won Y Combinator’s first ever hackathon, earning Tandon a chance to interview for a place on the well-respected accelerator program. We caught up with him to find out more about his creation and what’s next.

TNW: What is Athelas?

Tandon: Athelas is a low-cost microscope lens attachment for smartphones, coupled with a set of backend algorithms for computer vision based malaria detection.

Using machine learning I was able to put together a set of template-matching and edge-detection approaches for cell-counting and classification. Currently I’m pursuing the work in a more research-oriented fashion, validating the algorithm accuracy as well as testing the microscope attachment on a variety of blood smears and diseased tissue.

The ultimate goal is to be able to detect malaria in blood samples within seconds, as well as eventually train the algorithms to identify a variety of other diseases and conditions.

TNW: What problem are you solving? Is your solution disruptive?

Tandon: Malaria and related blood-based diseases plague much of the world’s rural areas. Unfortunately, high-cost morphology equipment and trained cell morphologists are not available on demand in these areas. Athelas conducts automated blood analysis through computer vision, and brings the microscope down to a low-cost ($5 dollars in production estimates, as opposed to the current $600-$700 setups) system attached to a smartphone.

More research is currently being conducted to boost the accuracies as well as make a smaller more efficient lens attachment.

TNW: Lab tests are highly regulated to ensure the correct results are delivered. The threat of false positives, or incorrect results might be a concern for pharma business. How accurate is Athelas?

Tandon: In terms of accuracy, the score from cross-validation training data is returning between .9 – .95 for cell identification and counts. However, this score is from the training data which consists of a higher resolution images and data than live data. For live blood samples (blood smears images taken directly from the smartphone camera attachment) the accuracy is currently in the range of .7-.75. This drop is due to the lower resolution and blurred images from the microscope.

Over the next few months, the main goal is to incorporate some sort of low-cost staining procedure, as well as multi-fusion image stitching algorithms to boost these numbers to more accurate ranges, comparable to the .9-.95 from training data.

TNW: How do you plan to commercialize Athelas?

Tandon: The first step will be to build out a viable, low-cost version of Athelas for deployment in rural areas. Initially, the focus will be heavily on a few sets of diseases including Malaria and blood-borne pathogens. Once the first version of Athelas is in use with rich datasets being collected, it would likely be best to commercialize Athelas to hospitals and related medical providers.

TNW: What does the prize from Y Combinator mean to you?

Tandon: The prize from Y Combinator means a lot – for one, it was a good validation of the importance on working on humanitarian and medical problems. Second, it shows that the interest in the field of biomedical is growing, and has encouraged me to work on more problems just like Athelas in the future.

The goal for the next couple months will be to get a good validation study in, and build a 3D-printed version of the microscope for simpler production and usage.

TNW: Who is Tanay? Tell us something about your plans.

Tandon: I’m currently a high school senior at Cupertino High. I love working on interesting computer science problems, and my previous experience is in machine learning and natural language processing with the startup I founded, Clipped.

I strongly believe that artificial intelligence and research can be used to drive innovative changes in society, and my goal as I enter college and the workforce will be to continue working on products such as Athelas that can enact positive changes through the power of computer science. In the near future, I will be readying a production-ready model for Athelas and presenting it at the upcoming Y Combinator interview.

TNW: Any tips for starting entrepreneurs from your side?

Tandon: From my previous experience with Clipped, the biggest lesson I learned was that ideas are cheap, but execution is key. In other words –it’s very easy to come up with that initial, seemingly game-changing idea, but the importance lies in building out the first version and getting users to try it out.

As long as the focus is on building prototypes, failing fast with those prototypes, and quickly coming out with the next version, then the creative juices will continue to flow – thus allowing innovative products in the long-run.

Top image credit: AFP/Getty Images

#athelas #smartphone #medical technology #medicine #y combinator #clipped #entrepreneur #malaria

0 notes

dataworkblog · 11 years ago

Text

A Fascinating Visualization of How Culture Expanded Around the World

By Matthew Elworthy

The theme of this year’s #TNWUSA conference is ‘Where Business and Cultures Collide.’ We’ll be focusing on international growth and regional technology trends, as well as best business practices and experiences when it comes to expanding abroad.

One of the most exciting things we’ve noticed about 2014 has been the number of tech companies testing the waters of unfamiliar markets. Regardless of our unparalleled access to information today (thanks to a little thing called ‘the internet’), cultural and regulatory differences around the world can trip up the most sturdy of expansion plans. #TNWUSA ‘14 aims to help you get to grips with these cultural contrasts and market differences, armed with a new range of tools and techniques.

So when we read a paper recently published in Science on the migration and history of culture across Europe and North America, we got a little bit excited.

The movement of culture over two millennia

As the above image demonstrates, “A Network Framework of Cultural History” by Maximilian Schich, Mauro Martino et al. is more than ‘just’ an academic paper – it’s a breathtaking visual history of the birthplaces, deathplaces, and therefore the migratory movement of more than 150,000 notable cultural thinkers across the last 2,000 years.

Schich, Martino, and the rest of the team procured their list of VIP (Very Intelligent People) from the last two millennia using available databases – including Freebase, the General Artist Lexicon, and the Getty Union List of Artist Names. They then set to work on establishing, visualising, and attempting to understand what the movement of culture looks like on a global scale.

The evolution of Europe

Here’s what a map of the movement of cultural thinkers across Europe looks like – including the rise and fall of Rome, the arrival of Paris on the cultural scene, and the influence of the British Empire (births are in blue; deaths are marked in red):

The mass-exodus stateside

And here’s what the rise (and rise) of the United States looks like. Notice the initial trickle originally hailing from Europe to the East Coast, and gradually making its way to the Wild Western Frontier (specifically San Francisco). This is followed by the eventual EXPLOSION of movement across the country, due to industries such as Hollywood and Silicon Valley – not to mention the arrival of affordable automobiles. Of course, New York is also there, shining brighter than almost everywhere else on the map:

One of my favourite things about the team’s plotting and scheming schematics is the visual evidence it produces for local instabilities in human mobility dynamics. For example: we see the intellectual monolith that was Rome and her empire, give way to other cultural hotspots such as Vienna, Berlin and Amsterdam (home to The Next Web). Perhaps unsurprisingly, Amsterdam had its highest global share of notable people during the 17th Century – the Dutch Golden Age. Think Rembrandt, Spinoza and Michiel de Ruyter.

After a significant drop-off, the Dutch Capital started to steadily reclaim ground with the intelligentsia by the 20th and 21st Century; today, Amsterdam is one of Europe’s most fertile hotbeds for artists, advertising agencies, and tech startups. In contrast, New York started to claim the lion’s share of notable people in the 20th and 21st century, far outreaching every other international hub except for London and Los Angeles. And when it comes to LA (or more specifically, Hollywood), we see roughly 10 times as many deaths as we do births. Make of that what you will!

#TNWUSA 2014

If this beautiful research is a history of the migration of culture, business and influence around the world, then #TNWUSA ‘14 will be a comprehensive opportunity to survey the contemporary landscape. However, there will be at least one significant difference: due to the datasets available to the team behind “A Network Framework of Cultural History,” there is a strong emphasis on the cultural influencers of Europe and the US. At #TNWUSA, we’re aiming to accommodate a slightly broader range of markets.

The conference on December 10 in New York will also see attendees, investors and influencers come together from a whole range of exciting up-and-coming regions, such as South-East Asia and South America.

When it comes to speakers, our lineup currently includes the likes of David Weinberger, one of the most influential American authors on technology, and Hilary Mason, a world-leading authority on data science.

David is best known for his work discussing the influence of technology on human communication, relationships and ideas, including the internationally acclaimed book The Cluetrain Manifesto. He is also a senior researcher at Harvard’s Berkman Center for Internet & Society, and Co-Director of the Harvard Library Innovation Lab. Hilary is currently Data Scientist in Residence at Accel – the leading venture and growth equity firm – as well as Scientist Emeritus at bitly, co-founder of HackNY, co-host of DataGotham, and a member of NYCResistor.

Both will also be joined by Nir Eyal, entrepreneur and author of Hooked: How to Build Habit-Forming Products. #TNWUSA2014 takes place on December 10 in New York City. Click here to purchase limited super early bird tickets.

For a copy of the research paper ‘A Network Framework of Cultural History,’ please visit Science. A 5-minute version of the data visualisation – accompanied by a descriptive voiceover – can be found here.

#culture #tnwusa #history #Data #technology

0 notes

dataworkblog · 11 years ago

Text

Big Data’s Poster Child Has Issues – But They’re Not Slowing Hadoop Down

By Matt Asay

Pity poor Hadoop. The open-source software framework is virtually synonymous with the Big Data movement. Yet one of its earliest, biggest users has joined a chorus of critics, charging Hadoop with being “unpredictable” and “risky.” Others, like Gartner’s Merv Adrian, worry about its weak security provisions.

Despite these (mostly) valid concerns, people and organizations are still lining up to adopt Hadoop, which makes it possible to store and process huge amounts of data on clusters of commodity hardware. Let's assume for the sake of argument that the entire planet hasn't just been hoodwinked into the Hadoop embrace. Why does it remain so successful?

Loopholes In Hadoop

As the poster child for the Big Data movement, it's not surprising that Hadoop is often given a free pass when it comes to many of its weaknesses. Still, there are an awful lot of them.

As one of the earliest users of Hadoop at Yahoo!, Sean Suchter seems qualified to point out Hadoop's weak operational capabilities. Among the concerns he highlights: Hadoop can usually ensure that a data job completes, but it is unable to guarantee when the job will be completed. Hadoop jobs often take longer to run than anticipated, making it risky to depend on the job output in production applications. When a critical production job is running, other, lower-priority jobs can sometimes swallow up the cluster’s hardware resources, like disk and network, creating serious resource contentions that ultimately can result in critical production jobs failing to complete safely and on time.

And then there's security. Gartner analyst Merv Adrian polled enterprises for their biggest barriers to Hadoop adoption. Among unsurprising results like "undefined value proposition," Adrian was particularly interested by how few seemed to care about Hadoop's security:

In response, he says, "Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there [are] numerous concerns."

Given the type of data—e.g., credit card transaction data, health data, etc.—commonly being used with Hadoop, it's surprising that so few seem to be thinking about security. But it's also surprising that these and other concerns don't seem to be holding back Hadoop adoption.

The Hadoop Train Has Left The Station

And let's be clear: none of these concerns has slowed Hadoop's rise. As IDC finds, over half of enterprises have either deployed or are planning to deploy Hadoop within the next year, with over 100,000 people listing Hadoop as part of their talent profile on LinkedIn:

In part this broad adoption reflects a characteristic of Hadoop: It's open source and encourages data exploration in a way that traditional technologies like enterprise data warehouses cannot. As Alex Popescu notes, Hadoop "allows experimenting and trying out new ideas, while continuing to accumulate and storing your data."

Developers and other users know it's complex and understand its other limitations, but the upside of quickly downloading the technology and using it to store and analyze large quantities of data is too tempting.

Also, there seems to be a growing awareness that the pace of innovation in the Hadoop community is so fast that today's challenges will likely be resolved by tomorrow. As such, Forrester analyst Mike Gualtieri declares that "[t]he Hadoop open source community and commercial vendors are innovating like gangbusters to make Hadoop an enterprise staple" to the point that it will "become must-have infrastructure for large enterprises."

And, Not Or

One other reason that Hadoop has proved so successful is that it's not really growing at anyone's expense. Hadoop doesn't displace existing data infrastructure, it just adds to it.

As Cloudera's Christophe Bisciglia notes: Rather than replace existing systems, Hadoop augments them by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do.

Still, while Hadoop isn't likely to replace an enterprise data warehouse today, relative interest in Hadoop is booming relative to its EDW peers:

Hadoop isn't perfect. It's not manna from heaven that will feed billions or foster world peace. But it's promising enough that enterprises are willing to overlook its problems today to benefit from its power tomorrow.

Lead image by Arpit Gupta

#hadoop #Big Data #open source #technology

0 notes

dataworkblog · 11 years ago

Text

The Future of Content Consumption, Through the Eyes of Yahoo Labs

By Derrick Harris

After years struggling through a public identity crisis it appears Yahoo has decided, for better or worse, that it’s a content company. There will be no Yahoo smartphones or operating systems, no Yahoo Fiber, and no Yahoo drones, robots or satellites. But that doesn’t mean the company can’t innovate.

When it comes to the future of web content, in fact — how we’ll find it, consume it and monetize it — Yahoo might just have the inside track on innovation. I spoke recently with Ron Brachman (pictured above), the head of Yahoo Labs, who’s now managing a team of 250 (and growing) researchers around the world. They’re experts in fields such as computational advertising, personalization and human-computer interaction, and they’re all focused on the company’s driving mission of putting the right content in front of the right people at the right time.

Really, it’s all about machine learning

However, Yahoo Labs’ biggest focus appears to be on machine learning, a discipline that can easily touch nearly every part of a data-driven company like Yahoo. Labs now has a dedicated machine learning group based in New York; some are working on what Brachman calls “hardcore science and some theory,” while others are building a platform that will open up machine learning capabilities across Yahoo’s employee base.

There’s also a related data science group, also in New York, that’s doing more applied research with product teams, “and we’ve hired machine learning scientists into almost every other group we have,” Brachman said. They’re working on everything from advertising to data centers, from social-graph analysis to network security.

But advertising is what pays the bills at Yahoo, and if there’s nobody to view the content, there’s nobody to see the ads. That’s why a lot of machine learning research is focused on making it easier for Yahoo’s users to get what they need. That means making images and videos as searchable as web pages, and making everything more searchable using natural language.

And while Yahoo Labs has hired a large number of Ph.Ds. since Marissa Mayer became CEO, some of its talent in the content space has come about, fortuitously, from acquisitions made without any (or little) input from Brachman. One of Summly’s technical leaders, for example, joined Yahoo Labs after that acquisition and created a technology for summarizing multiple documents that Brachman says is integral to the Yahoo homepage. He said the SkyPhrase team, which Yahoo acquired in December 2013, is a natural fit into Yahoo Labs given its research background and its cutting-edge natural-language processing technology.

“We want people to be able to access Yahoo products from wherever they are, through whatever type of device they have,” Brachman said about the promise of NLP. It might be that users have vision problems, or aren’t in front of a screen but still need to know the answer to a question or track down a piece of content.

“I think in the longer-term future of Yahoo … natural language understanding is going to be very important,” he added.

Cranking up deep learning and artificial intelligence

Right now, though, when many people think of machine learning and the web, they’re thinking of computer vision. Whether it’s based on deep learning or some other set of techniques, everyone — Google, Microsoft, Facebook, Pinterest, Dropbox, Twitter — seems to be investing in figuring out how to make sense of all visual content they have. And indeed, Brachman said, Yahoo Labs is “doing all kinds of cool stuff with Flickr and image search,” and is also working on ways of indexing and recommending videos.

He specifically called out work on computer vision algorithms that can determine what makes a good picture of a human — the right angles, right colors, etc. — and automatically curate search results to put the best images up top. Brachman’s team has also developed a method for producing video “thumbnails” so users can get a better sense of what they’re about and what they contain. Then, of course, there are object-recognition efforts, led largely by researcher Jia Li, to automatically tag Flickr images so users don’t need to know how they’re titled or tagged in order to actually find them.

“We’re doing some of that, as well,” Brachman said, referencing public claims by Google and Microsoft about the advances their deep learning research has had on the accuracy of image classification. However, he added, “We haven’t made a big public fuss about this like some of our colleagues out there have done.”

In fact, Bobby Jaros, the co-founder and CEO of LookFlow, a deep-learning-based computer vision startup Yahoo bought in October 2013, originally was embedded within the Flickr team but has joined Yahoo Labs in an effort to grow out a deep learning team. Presumably, that team will work on applying those techniques in areas beyond what Li is already researching in computer vision. Brachman cited advertising, recommendations, personalization and privacy as other areas where “neural-net-flavored deep learning will be a nice additional tool in our toolbox.”

He’s excited, but realistic, about what deep learning and other new approaches to artificial intelligence could mean to a company like Yahoo over the next few years. “Back in the earlier days … we were doing research that felt more speculative than it does now, in a way, because so much of artificial intelligence has become real,” Brachman said.

For example, when he created the DARPA framework that ultimately led SRI to develop Siri, he was skeptical that the work could be completed in five years, and now “Siri is in the pocket of some humongous number of people around the world,” he said. These successes have inspired the machine learning community as a whole, which is coming around on the idea of scaling up all sorts of approaches previously confined to laboratories. Stuff that was deemed futuristic 20 years ago is now legitimate, which is not something Brachman necessarily would have predicted.

“People are starting to entertain those things because they’ve seen artificial intelligence impact society and business,” Brachman said.

But it won’t all be smooth sailing as today’s researchers try to take today’s hot techniques to the next level. Brachman said there are still “huge advances” that need to happen before we have full-scale AI systems, starting with relatively simple things such as connecting the results of a deep learning algorithm to a knowledge graph for the sake of correcting the learning model’s mistakes. “No one knows how to do that,” he said.

Tying it altogether into a ubiquitous Yahoo

When everything comes together — everything Yahoo Labs is presently working on, and probably some stuff not yet on its plate — Brachman envisions a Yahoo that’s as ubiquitous as computers seem destined to be. Phones, watches, public terminals, brain implants — Yahoo wants to be able to deliver content to all of them. That probably means rethinking the inputs (e.g, typing, voice, and probably even video and location data) as well as what the applications actually look like and how users expect stuff to be delivered in any given situation to any given device.

“We really want to understand the human element of being mobile,” Brachman said. “If we ever build an integrated, useful Yahoo presence,” he added, “it needs to know about all these things.”

Of course, this being Yahoo, advertising will always play a role in how the company designs its future. Brachman said areas such as “advertising science” continue to be important, even though it has been years since Yahoo first claimed to have mastered the targeted ad. As devices and ad types evolve, design strategies and optimization algorithms need to evolve with them. One of Yahoo Labs’ bigger recent projects, for example, was a new advertising platform called Gemini that lets advertisers manage their mobile search and native advertising campaigns through a single system.

In the ubiquitous computing future Brachman predicts, Yahoo and its advertisers will have to figure out how to make ads something people actually desire. He points to current-day fashion and bridal magazines as media where people actually buy them for the ads. Others might point to a digital platform like Pinterest.

“What we need to think about at a deep conceptual level is ‘What is content?’ ‘What are communications between humans?’ ‘What is advertising?'” Brachman said. “… All in the abstract, all independent of how it gets to a person.”

#yahoo #yahoo labs #content #computing #Data

0 notes

dataworkblog · 11 years ago

Text

When Big Data Is Watching You

via TechCrunch

By Natasha Lomas

Is the answer to our feeble human minds needing to grapple with increasing quantities of big data to stand in a purpose built room immersed in complex data visualisations while wearing an array of sensors that track our physiological reactions? A group of European Commission-backed scientists believe so.

The basic concept underpinning this research, which has attracted €6.5 million in European Union funding under the Future and Emerging Technologies Scheme, is that a data display system can be more effective if it is sensitive to the human interacting with it, enabling it to modify what’s on display based on tracking and reacting to human stress signifiers.

The project is called CEEDS — aka Collective Experience of Empathetic Data Systems — and involves a consortium of 16 different research partners across nine European countries: Finland, France, Germany, Greece, Hungary, Italy, Spain, the Netherlands and the UK. The “immersive multi-modal environment” where the data sets are displayed, as pictured above — called an eXperience Induction Machine (XIM) — is located at Pompeu Fabra University, Barcelona.

On the cognition enhancement side, the system can apparently respond to the viewer’s engagement and stress levels by guiding them to areas of data that are potentially more interesting to them, based on tracking their physiological signals and signposting them to click through to particular parts of a data set.

Again, the core concept driving the research is that as data sets become more complex new tools are required to help us navigate and pin down the bits and bytes we do want to lock on to. Potential use cases envisaged for the XIM technology include helping students study more efficiently.

Early interest in the tech is coming from museums, with the XIM concept offering a way to provide museum visitors with a personalised interactive environment for learning. Indeed, CEEDs’ tech has been used at the Bergen-Belsen memorial site in Germany for two years. The team says now that discussions are ongoing with museums in the Netherlands, the UK and the US ahead of the 2015 commemorations of the end of World War II.

It also says it’s in talks with public, charity and commercial organisations to further customise “a range of CEEDs systems” to their needs — with potential applications including a virtual retail store environment in an international airport and the visualisation of soil quality and climate in Africa to help local farmers optimise crop yields.

The concept of an information system watching and reacting to the person absorbing information from it is interesting (plus somewhat creepy, given it is necessarily invasive), although more so if the system does not have to be room-sized — and require the wearing of an entire uniform of sensors — to function.

It’s conceivable to imagine a lighter weight version of this concept, which — for instance — could track what a mobile user is looking at via cameras on the front of their device and monitor additional physiological reactions by syncing with any connected wearables they have on their person, combining those inputs to make judgements on how engaged they are in particular content, for instance.

Whether that sort of tech will be used to generally aid human understanding remains to be seen. It seems more likely it will be leveraged by advertisers in an attempt to make their content more sticky.

Indeed, Amazon has already released a phone that has four cameras on the front — ostensibly to power a head-tracking 3D effect on the interface of its Fire Phone but well positioned to watch the reactions of the person using the device as they look at things they might be thinking of buying.

So, as we devise and introduce more systems that are designed to watch and monitor us and our reactions, it’s worth remembering that any complex system with eyes is only as impartial as the algorithmic entity powering it.

[Image: copyright specs.upf.edu]

#techcrunch #bigdata #cognition #ceeds

0 notes

dataworkblog · 11 years ago

Text

GE Starts Rolling Out Pivotal’s Big Data Technology To Its Own Customers

By Barb Darrow

General Electric, which has touted the potential advantages of applied big data for a few years and last year put its money where its mouth was with a $105 million investment in Pivotal, is now ready to declare that it has started to reap the rewards.

Using Pivotal’s Big Data Suite and EMC’s appliances, GE built out its own capability first for its aviation group in 90 days, which then connected up to 25 airline customers to make use of all that data and analytics, according to Bill Ruh, VP of GE Software, the executive spearheading this effort. GE is a leading builder of aircraft engines and a key goal of using machine data and analytics is to provide better predictive maintenance.

“We want to get away from that alarm fatigue mentality,” Ruh (pictured above) said in an interview. “We want to know when a part is likely to break and watch usage patterns to see how parts can be more efficient and optimized,” he said. In this world, making a gas turbine one or two percent more efficient can add up to huge savings.

Aggregating data from 15,000 flights yielded 14 GB of information per flight, which could then be analyzed in a reasonable amount of time. Some of the lessons learned may seem simple — for example, jet engines that operate harsh, dusty environments need to be washed more often — but that sort of insight can prevent big problems.

Using traditional methods it could take 30 days to sort through data required to figure out a maintenance issue. Now major analytics can be run in 20 minutes, he said. Having all that data — the rows-and-columns of relational data plus the not-so-organized non-relational stuff — in one repository and then being able to access it for analysis represents the “data lake” concept pushed by tech vendors of late.

“This is one of the first and most compelling examples of how customers get value out of data they couldn’t have done in a cost-effective way before –and it shows how much value can be gotten out of disparate data sources,” said Pivotal CEO Paul Maritz.

But it’s not the end at GE. The company’s healthcare division — which makes CAT scanners and other gear — is now rolling out the technology with GE’s power generation, oil and gas, rail and transport groups to follow, Ruh said. The company is integrating its own Predix software with Pivotal’s technology as well.

Clearly GE, with its $257 billion market cap, can afford to pay big bucks for this sort of thing, but Maritz said he expects the technology will also be delivered via a SaaS model in the future so that smaller companies can benefit as well.

For more about Pivotal’s plan to jumpstart new big data applications, check out the video below of Paul Maritz’s talk at Structure Data 2014.

#GE #Pivotal #bigdata #analytics

0 notes

dataworkblog · 11 years ago

Text

Google Shows off Mesa, a Super-Fast Data Warehouse that Runs Across Data Centers

By Derrick Harris

Google is taking the wraps off yet another impressive feat of database engineering, a data warehousing system called Mesa that can handle near real-time data and is designed to maintain performance even if an entire data center goes offline. Google engineers are presenting a paper on Mesa at next month’s Very Large Database conference in China.

The paper’s abstract pretty much sums up why Mesa was built and what it’s capable of:

Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google’s Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails.

Essentially, Mesa is an ACID-compliant database (i.e., if someone queries it, they’re getting the right data) that’s built for speed, scale and reliability. It was, as explained above, designed to handle needs relating to Google’s ad business (serving internal users, as well as a front-end query service for customers) but can also function as a generic data warehouse system for other use cases.

If you’re wondering why Google had to build Mesa at all, given the myriad other database systems it has created over the years, the paper’s authors explain that, too:

BigTable does not provide the necessary atomicity required by Mesa applications. While Megastore, Spanner, and F1 (all three are intended for online transaction processing) do provide strong consistency across geo-replicated data, they do not support the peak update throughput needed by clients of Mesa. However, Mesa does leverage BigTable and the Paxos technology underlying Spanner for metadata storage and maintenance.

Google also has a system called Dremel, which is the foundation of its BigQuery service and is designed for fast ad hoc queries of read-only data. The paper notes various database systems built by vendors, as well as by Facebook and Twitter, but suggests they’re usually designed for bulk data-loading versus the minutes that process takes in Mesa. “A system that is close to Mesa in terms of supporting both dynamic updates and real-time querying of transactional data is Vertica,” the paper notes.

“However,” it continues, “to the best of our knowledge, none of these commercial products or production systems have been designed to manage replicated data across multiple datacenters. Furthermore, it is not clear if these systems are truly cloud enabled or elastic. They may have a limited ability to dynamically provision or decommission resources to handle load fluctuations.”

The paper goes into detail about how Mesa works — how data is stored (in tables), how data is queried and the distributed architecture — but one particularly interesting part has to do with the hardware. The paper notes that Mesa’s predecessor system ran on “enterprise-class” hardware that was expensive to scale. Mesa runs on Google’s standard cloud infrastructure, presumably on boxes designed and built by Google itself.

In the long run, Mesa could prove to be more than just another data warehouse system, though. Members of the Hadoop community — particularly Mike Olson and Doug Cutting of Cloudera — talk about watching Google to spot new directions that Hadoop might take, and a quality open source version of Mesa would likely prove very popular.

And then, of course, there’s the cloud computing angle. As Google continues to encroach on territory staked out by Amazon Web Services and Microsoft Azure, technology can matter as much as low prices. Google’s claim to fame has always been its cutting-edge distributed systems, and exposing something like Mesa as a service (in the same manner it has with BigQuery and Dataflow) would be a big point of differentiation between Google and its cloud peers.

Watch the video below to hear Google SVP and Technical Fellow Urs Hölzle discuss his company’s infrastructure at our Structure conference in June.

#blogposts #google #data

0 notes

dataworkblog · 11 years ago

Text

The Data Centers of Tomorrow Will Use the Same Tech Our Phones Do

By Peter Levine This article originally appeared on WIRED

The mobile revolution has spread beyond the mini supercomputers in our hands all the way to the datacenter.

With our expanded use of smartphones comes increased pressure on servers to help drive these devices: The activity we see everyday on our phones is a mere pinhole view into all that’s happening behind the scenes, in the massive cloud infrastructure powering all those apps, photo-shares, messages, notifications, tweets, emails, and more. Add in the billions of devices coming online through the Internet of Things—which scales through number of new endpoints, not just number of users—and you begin to see why the old model of datacenters built around PCs is outdated. We need more power. And our old models for datacenters are simply not enough.

That’s where mobile isn’t just pressuring, but actually changing the shape of the datacenter—displacing incumbents and creating new opportunities for startups along the way.

How the Mobile Supply Chain Is Eating the Datacenter

No one ever imagined when the first IBM PC with two floppy drives came out in 1981 that it would become the basis of the biggest change in datacenter technology since the mainframe. But that’s what happened: Until then, datacenters were run by Unix minicomputers. Once PC-based architectures started taking over, however, the datacenter started getting cheaper. The innovations around PC built the current datacenter model—ushering in technologies such as x86 server virtualization, Linux, and Windows Server—which became the de facto standard for cost, performance, and standardization.

Today, the mobile phone industry is where so much innovation has been concentrated—resulting in an entirely new class of components created just for this smaller form factor: flash memory, smaller CPUs, networking hardware, and so on. Which means lightweight processors (such as ARM) and low-cost, low-power mobile components are now becoming the foundation of the next-generation datacenter. In other words, the data centers powering our cell phones—from across the internet—will be remade using the same technologies that sit inside those phones.

All of this may seem counterintuitive at first. Because surely more computing power in the data center would mean bigger and bigger CPUs—not smaller and smaller parts—packing in ever more transistors? But that’s where power and cooling may have reached its limit. While Moore’s Law put immense computing power in our hands, it also multiplied the sheer scale of data, apps, and computing resources being used around the world. Things are heating up (literally) inside the datacenter; we can no longer rely on big hardware to power the mobile revolution.

Still, the significant shift here isn’t just in going from bigger to smaller. It’s about eliminating all vestiges of the proprietary hardware used in networking and storage in favor of commodity components available through the mobile supply chain. It’s about this commodity hardware performing the function of proprietary systems today.

Picture a bunch of dirt cheap, cell-phone-like machines—all connected together with sophisticated software—instead of those power-sucking, refrigerator-sized boxes.

Ever seen a mobile phone with a fan or on-board cooling device? No, because they’re designed to operate at great temperature variations, which translates into power and cooling optimizations. Those power and cooling costs will therefore be drastically reduced from today, and these datacenters will use much less power and probably also less floor space per unit of CPU. The new mobile-defined datacenter will therefore be more efficient to operate and cheaper to make because the baseline hardware comes directly from the mobile supply chain.

Aggregating smaller parts together doesn’t mean sacrificing enterprise-level performance or processing muscle. This blueprint for infrastructure already influences the massively scaled applications and services that Facebook, Google, and Twitter operate, for example. It also means that datacenter architecture is no longer being defined by Wall Street. Why would companies want to copy the banking industry’s legacy architectures, when the likes of Google and Facebook can achieve mission-critical scale and serve billions a day using commodity hardware?

More importantly: What happens when the scale of the Googles and Facebooks now becomes much more accessible to everyone?

Mobile-Defined Datacenters Present a Huge Opportunity for Startups

Mobile is not only changing the composition of the datacenter, but is also forming the basis for the next generation of companies.

The point is not about startups challenging Intel (though that may happen). It’s about startups leveraging different hardware—ARM processors, flash storage, and networking—to build systems that are largely software enabled—thus bypassing legacy models, where the value was in fat margins on hardware and in on-premise installations.

Due to this shift in valuing hardware to software in datacenter architecture, all businesses can now access every level of the computer-networking stack (through companies like like Actifio, Coho Data, Cumulus Networks, Mesosphere; these are just the examples from our portfolio) without requiring the resources of a big player.

This doesn’t mean everything gets deployed from the cloud. But it does mean that everything becomes a service.

Because many of these mobile-inspired applications are delivered from the cloud and as-a-service (instead of on-prem), the shift to cloud infrastructure inside companies will happen almost unwittingly in many cases. The software-as-a-service (SaaS) model applied to infrastructure means enterprises can adopt these new tools at a departmental level—which means they spread without central IT departments even knowing it. Instead of slogging through long proof-of-concept phases and expensive beta tests, companies will try-before-they-buy directly from the cloud. Instead of waiting a year for product releases, infrastructure users too will now expect instant updates.

All of this adds up to reduced cycle times to adoption. Which means incumbents can’t keep up, especially because SaaS involves entirely different sales and customer service processes, revenue recognition models, engineering, R&D, and more. To avoid being passed over, legacy companies will have to adapt to the next generation of cloud infrastructure.

For startups, the combination of hardware accessible from the mobile supply chain, open-source building blocks, and SaaS means that—for the first time—the entire stack can finally be re-invented.

Before, startups had to fit into the legacy stack of computer-networking-database, with APIs at the top and bottom of every layer. If the dominant market player didn’t want a startup in that stack, all it had to do was restrict access to its APIs or cry out, “Sorry, not supported.” Game over.

Now, instead of having to slip around and get stuck inside this incumbent fat, startups can offer solutions at every level of the infrastructure stack…without having to be a part of it. They can bypass the existing stack. And there’s a whole new set of benefits for people as new startups build new applications and businesses on top of these new platforms. It’s an unprecedented opportunity.

#blogposts #data centers #mobile #technology #centers #data

0 notes

dataworkblog · 11 years ago

Text

Google Analytics Can Now Exclude Traffic From Known Bots And Spiders

By Frederic Lardinois This article originally appeared in TechCrunch

Google made a small but important update to Google Analytics today that finally makes it easy to exclude bots and spiders from your user stats. That kind of traffic from search engines and other web spiders can easily skew your data in Google Analytics.

Unfortunately, while generating fake traffic from all kinds of bot networks is big business and accounts for almost a third of all traffic to many sites according to some reports, Google is only filtering out traffic from known bots and spiders. It’s using the IAB’s “International Spiders & Bots List” for this, which is updated monthly. If you want to know which bots are on it, though, you will have to pay somewhere between $4,000 and $14,000 for an annual subscription, depending on whether you are an IAB member.

Once you have opted in to excluding this kind of traffic, Analytics will automatically start filtering your data by comparing hits to your site with that of known User Agents on the list. Until now, filtering this kind of traffic out was mostly a manual and highly imprecise job. All it takes now is a trip into Analytics’ reporting view settings to enable this feature and you’re good to go.

Depending on your site, you may see some of your traffic numbers drop a bit. That’s to be expected, though, and the new number should be somewhat closer to reality than your previous ones. Chances are good that it’ll still include fake traffic, but at least it won’t count hits to your site from friendly bots.

#blogposts #google #analytics #traffic #bots #spiders

1 note · View note