Founder & Product Architect of Powerset

Powerset and PARC join forces to revolutionize search

The NYT today covers a fundamental and very revealing piece of the Powerset story, one that we have kept close to us and under wraps for over a year now. Powerset completed an exclusive licensing deal for all PARC’s technology and IP related to their developments in natural language understanding. This deal has been 15 months in the works and has involved a tremendous amount of dedication from us at Powerset and from our counterparts at PARC, and we do believe we emerged from this period with a strong, solid and mutually beneficial and ongoing relationship that will foster the success of Powerset and see the culmination of over 35 years of unwavering work and commitment by PARC, advancing the state of the art on solving the fundamental problems of Natural Language Understanding.

The PARC Natural Language Technology

Think about this. A small, dedicated, unwavering and incredibly competent team of world-class scientists, researchers, and engineers that have for years looked at some of the hardest problems in AI and computer science and have refrained, year after year, from assuming that what they had done was already good enough and instead tackled and solved, one after the other, each most fundamental problem they still knew to be outstanding, before they could actually stand back and really say: “You know, I think we may be ready to really try and bring this to the next level”. That’s briefly the vision and mantra that has guided the NLTT team at PARC for over 30 years, building some of the most advanced Natural Language technology in the world. And it was the marriage between that degree of technical excellence and the vision that Barney, Steve, myself and the rest of the team at Powerset brought to PARC, about a new, revolutionary kind of search engine, that has taken us where we are today, with the commitment to change the face of search.

Ron Kaplan is now Powerset’s CTO/CSO

As amazing as it sounds, Ron Kaplan, the founder and leader of the Natural Language group at PARC for over 30 years, joined Powerset last July as Powerset Chief Technology Officer. Having him on board for these past months has been a tremendous boost to Powerset’s technical direction and development, and a phenomenal asset. It has also  been a challenge to maintain a somewhat low-key profile on his move from PARC to Powerset. While the negotiation was in the works we have been focused on making sure that things proceeded smoothly and without distractions and we really wanted the connection between us and PARC to remain somewhat of an unknown. It is amazing to be now able to tell the world what an incredible opportunity we have created. Alongside with Ron – as part of the relationship – comes the expertise and experience of a world-class team of scientists and engineers, who complement Powerset’s very own amazing team, ensuring that we can together bring this technology to the world. It is indeed a one-in-a-lifetime opportunity and a challenge that we are all undertaking with the utmost level of excitement.

The missing link: Natural Language as the breakthrough in Search

PARC and Powerset have now a tremendous relationship and can move together towards success. But there is another piece of the Powerset story that goes beyond natural language processing itself. Powerset secured the rights from PARC to commercially use what we believed was the most powerful linguistic technology in the world. But we also added something very special, that we developed in-house, which is unique to Powerset. We redefined and devised a completely new way of transforming the traditional keyword-search core, and have turned it on its head, understanding the meaning of text and capturing the relationship between concepts, while being efficient. This was probably the most important innovation and contribution that Powerset has created and that we believe – in conjunction with PARC’s groundbreaking technology – will enable us to deliver on the very bold promises we have made.

I am so incredibly excited by the opportunity, and humbled by the size of the challenge. We have access to incredibly powerful natural language technology, and I believe we have broken the mold and created a new kind of core, targeted at the future of search but able to benefit from the advances and lessons learned from the big innovators in search of the late 90s.  That, coupled with the fact that much of the distributed computational platform is becoming increasingly commoditized, both on the software and hardware side, really makes this one a unique moment in history.

And yet Powerset wouldn’t have nearly the same chances without our amazing team. We have focused on attracting the most talented, smart and experienced people in the industry and we are creating a new center of excellence for search experts and for natural language experts alike. I am blessed to be working with such incredibly smart and motivated people. They are the ones who are truly building the potential for success of Powerset.

The Inevitable Destiny of Search

Search has become the primary conduit of access to the world’s information. Every activity that takes place on the web, be it educational, transactional, or for entertainment, in a large number of instances begins and thrives with search.  John Battelle has called ‘search’ the new OS, the environment in which everything takes place. And yet there is more to be unlocked on the path to better search. The main problem is that the type of core capabilities that search has offered have not fundamentally changed ever since the main innovations that companies such as Altavista, Overture and Google brought to the world in the late nineties. We’re still stuck with a limited model of how to model and retrieve information: we – as users – must anticipate the words in the documents that will satisfy our need for information, and must do so in an unnatural way. We have to convert our need, which we naturally would express as a search phrased in our very own language, to keywordese, which is less powerful, less usable, and less natural.

The search community has done a heck of a job squeezing value and effectiveness out of keywordese, but it’s time for the next step.

Read More

The all new Powerset Blog

It’s going to be a week full of new announcements for Powerset and for many others, as the Web 2.0 conference opens its doors on Tuesday. During the weekend they launched a revamped company website – with much of the same content, but with a better visual dress and with lots of space for future enhancements. As their co-founder Steve Newcomb mentions on his post, they decided to start a new company blog, creating a better channel for spreading the news about Powerset and to better respond to comments and questions about them.

As Steve says,

Why did we post a new web site, but not reveal very much? The main reason is that we are about to make several announcements at Web 2.0 next week (HINT: Series A) and we wanted to create our blog area so that we can have a forum to discuss various topics. 

In the last blogstorm.  Our website was not well equipped to handle the inquiries and discussions that were taking place.  We have learned from the past and now we have a better way to communicate with our audience. 

Most importantly, they created a new press area for Powerset, where they will be posting important announcements and press releases. The first announcement was made yesterday, when they disclosed the identities of the original A-list angel investors that believed in Powerset and provided their backing and invaluable advice. To comment on that theme, our first Powerset blog post speaks of the importance of all of the people involved in their company, from employees, to investors, to advisors.

Read More

Jeff Bezos and Amazon EC2

The Business Week cover story from this week (Jeff Bezos’s Risky Bet), featuring Amazon CEO Jeff Bezos in occasion of the announcement of the latest addition to Amazon Web Services initiative: the Elastic Computing Cloud (EC2).

EC2 is a very interesting service which allows anybody to leverage the same reliable and massively distributed computing grid that powers Amazon’s own operations.

The article is a great coverage of the importance of EC2 and its sister services in Amazon’s new strategy. It also mentions, amongst other things, Powerset’s use of EC2 as part of our technology strategy. Steve will be coming out with a new post with more details on EC2 and Powerset, after Jeff’s talk on wednesday.

From the Business Week’s piece, here are the relevant sections:

Consider Powerset, the secretive search startup backed by A-list angel investors, including PayPal Inc. (EBAY) co-founder Peter Thiel and veteran tech analyst Esther Dyson. Co-founder and CEO Barney Pell harbors ambitions of out-Googling Google with technology that he says would let people use more natural language than terse keywords to do their searches. By analyzing the underlying meaning of search queries and documents on the Web, Powerset aims to produce much more relevant results than the current search king’s.

Problem is, Powerset’s technology eats computing power like a child munches Halloween candy. The little 22-person company would have to spend more than $1 million on computer hardware, two-thirds of that just to handle occasional spikes in visitor traffic, plus a bunch of people to staff a massive data center and write software to run it. That’s when Pell heard about Elastic Compute Cloud. He was sold. Based on tests so far, using the Amazon site for part of the company’s computing power could cut its first-year capital costs alone by more than half.

… Highly anticipated search upstart Powerset Inc. plans to use the Amazon computing service, even though it’s still in test mode, to supplement its own computers when it launches its service sometime next year.

EC2 is a great piece of a larger story abut how Powerset is being enabled to focus its core competence – developing the technology that will bring to maret the next breakthrough in search – without reinventing the wheel.

Read More

We are all Natural Language Searchers…

A debate that has hardly died, judging from the number and the fervor of the comments and responses generated by Matt Marshall’s first and second coverage of our approach to search. Analyzing the debate, the posts and comments that popped up pretty much everywhere had a few common themes: “natural language has been tried before and failed”“keyword search is enough for people’s needs”, and “it doesn’t matter anyway, because users won’t change their behavior”. I thought it was worth expressing my own view on these themes, and to explain why natural language is the inevitable destiny of search.

Can Natural Language really make a difference?

Powerset is a natural language search company. Yes, the road is paved with failed attempts at delivering on the long-sought grail of information retrieval. But whether or not others have succeeded at this task, it surprises me how many out there are really convinced that search is a solved problem. And even amongst those who realize that search is not as good as it could be, many are of the opinion that there’s nothing limiting in the expressiveness of keywordese.

I can summarize most of the arguments made in support of keyword search, around three major axes:

* The vast majority of queries are only a few words long

* There is only so much semantic intent that one could extract from queries such as “Britney Spears”, “beach”, or “digital cameras”

* Whether or not one could build a better search engine is a moot point; delivering better results to queries formulated in natural language won’t work, because it would require users to change their behavior.

I’ll address these topics in order, starting with the last one.

Changing Users’ Behavior

I am probably a pretty good keyword searcher, and yet I have no good way to describe to somebody how I come up with successful sets of keywords that deliver me the information I seek; I have learned and refined my technique over time. And the fact of the matter is that all successful searchers have adapted to the limits of technology. We have trained ourselves – for necessity – to translate our needs into keywords as successfully as we can. And yet, many recognize the limits of search today.

Human history is characterized by an interesting tension between innovation and adaptation. In the early days of human history the rate of innovation was much slower and the rate at which we adapted to the environment and its constraints, much higher. Throughout time, innovation caught up and in this day an age, we are much more likely to – say – devise a cure for a fatal pathogen, than we are to adapt to its effect. Still, when no better solution is available (and especially when the adaptation is only behavioral), people are effective at devising efficient strategies to solve their needs, and would much prefer do so than to give up altogether in their goals. As a consequence, many people have become masters in the art of keywordese searching. Not so much unlike developing a grunting pidgin language, as Barney puts it, to communicate with someone with no common language – you’d advance enough to say some things, sooner or later.

Yes, no one from Google sat users down and told them “two words only and no conjunctions”, but reinforcement training is much more effective than any instruction. People don’t bother giving search engines more (in terms of words, and context), because they’ve learned that in return they get less.

So how hard is it to untrain users? How hard is it to change a behavior? It depends. There are three major metrics that one could use to describe the ease (or resistence) to users changing their behavior.

1. The cost of changing one’s behavior. For example, it seems clear that there is some actual cost in typing a number of extra words or characters. On the other hand, a fundamental dimension of language-based communication is the conversational aspect. This is probably a topic better left to a different post, but it’s reasonable to expect that effective conversational interface may actually reduce the cognitive load associated with typing, as context is preserved from utterance to utterance.

2. The benefit associated with the change. One could reasonably argue that a significantly better search experience, with better and more precise results, more often, would constitute a benefit worth changing one’s behavior.

3. The pre and post energy state.  This is probably what is fundamentally different about switching to searching in natural language: it’s not really a switch. How many people do you know who formulate thoughts in keywords? A change from a less intuitive practice, harder to understand and learn, to one that is more natural and easier to adopt is clearly a change with the flow, and not against it

The central idea in bringing to consumers a natural language search experience that (actually) works, is that the change is aligned with what’s natural to people, it is a change from an unnatural way of expressing intent (one that works as well as technology allows for today), to one that is more natural and easily converted into tangible, readable, typeable form.

But it’s not the end of the story. It’s also a change from an impoverished language, one that loses information and expressiveness in the conversion from intent to form (keywordese), to a highly expressive and powerful language – one which everyone is naturally prone to use.

The Query Universe: keywords, questions and all shades in between.

I completely recognize that there are cases in which your two-words vanilla search will do just fine in expressing what you need. Maybe you don’t really know what you want. Maybe you type “Jane Austen” and you just want to be entertained to a carousel of general interest documents that will teach your more about Jane Austen. And yet, what if what you wanted was to know about books that describe and review Jane Austen’s portrayal of the clergy? An encompassing search experience should satisfy users in both these cases, with as little effort as possible. Natural language search doesn’t mean impinging on what’s intuitive and natural to people by forcing some artificial constraint of semantic or syntactic well-formedness. It means using any and all linguistically relevant content that users do include in their queries, and rewarding them for doing so, thus encouraging experimentation, and reversion to a more natural way of phrasing intent than a bunch of keywords.

Note that I am purposefully talking about “natural language queries”, and not “questions”.  Questions are just one type of natural language query. A search engine that only answered questions, even if it did so really well, would address just a fraction of what people use search engines for. And many have made that mistake. AskJeeves opened somewhat the door to the market. Before they retired Jeeves, the folks at Ask  figured out that users would like to come and ask questions on their website. The initial user response showed them quite right. The AskJeeves developers thought that an editorial approach, manually compiling the best answers to the big head of the questions that users asked, would be a winning strategy. It wasn’t. Although this was some time before the term “long tail” entered the vocabulary of search-savvy folks, it should be clear why. At the time, people didn’t realize that much of the value of search was not in the most common queries, but rather in the long tail of queries. People search for all sorts of things and once they think they can use language for some things, they’ll want to do so across the board. As a matter of fact, others do realize the importance of questions, today. Go to Google and ask “Who shot Lincoln?” and you get a nice a “one-box answer” with Lincoln’s murderer. Google shies away from the editorial approach and mines a nice set of sources which can provide quick answers to questions. But in reality, both approaches are limited, hackish, and brittle. Ask Google “Who murdered Lincoln?” and the one-box disappears. Still, why doesn’t Google publicize this feature much? Probably because telling users “come and ask a question, and we’ll get you some answers, some of the time. But don’t use language in all of your other searches because it’ll get worse…” doesn’t seem like a very consistent marketing message.

The long tail of failed queries

Danny Sullivan cites Google Zeitgeist’s remarkable lack of long queries to represent what users will do with effective natural language search. Query logs are helpful, but the data can be misguiding. The data so far about short queries and past failures of natural language attempts is no indication about what users will really do or not do, as users have never yet been presented with the possibilities of true natural language search. What we do know is that users are attracted to the idea of being able to search using language and that they do so occasionally. Why would Google et al, bother to include some language-like features in their one-box results otherwise?

Moreover, it might be that looking closely at the bottom of query logs (the long tail), you’d find many longer, language-rich queries, all of them more or less about “Britney Spears“, but different enough not to be adding up to a high total of identical strings. But what’s really interesting, continuing our speculation, is that each one of those long queries could very well be the first attempt of a user who was really interested in something very specific about Britney. The only catch is that the vast majority of those queries likely failed to return what users wanted – or returned nothing relevant at all – since keyword search engines often get confused by the additional “noise” that natural language introduces in their statistical models. So – who could blame them – sooner or later, some or many of those users probably just threw their hands up and searched again simply for “Britney Spears“. One can see how the inflated number of “short searches” at the top of the query logs might very well come from the total failure of keywordese search engines to return users what they really wanted.

Managing Users Expectations

And yet, the obstacles are significant. What’s challenging is not that users are satisfied with the status quo, or that they won’t change their behavior. Rather the problem stems from the fact that people know language very well. The real risk is that bringing real language understanding capabilities to search might generate unsatisfiable expectations. Search is like air, and just as we need oxygen, we have an insatiable need for information. When this need is combined with the ease of use of natural language, the stakes become much higher. At Powerset, we realize this. We know that in order to be successful, we must always effectively communicate to users the power and the limits of our technology, controlling their expectations, and striving everyday to amaze them with something they previously thought to be impossible.

The future of search

Finally, a teaser for a future post. A lot about innovating is also about staying ahead of the curve: as technology progresses and speech technologies mature further than they have to date, are we really going to be performing searches by uttering disconnected keywords? At that point, function words and word relationships will necessarily be omnipresent and one would be foolish to ignore their importance…. Language and natural language will likely become pervasive once speech technologies mature and we get used to accessing information in mobile and otherwise encumbered environments (i.e. cars, etc…).

Read More