Saturday, July 31, 2004

Google's alternative computing platform

Is Google trying to replace the desktop PC? From an article by Rajesh Jain:
    What Google has done is to build an alternative computing platform. This is becoming obvious as it starts our to roll out various services which go beyond just search: a shopping service, social networking, a blogging platform, email with a difference (not to mention plenty of storage), and a local search/yellow pages engine.

    Rick Skrenta [CEO of Topix.net] had this to say: "Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application. While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming."

    Tim O'Reilly [CEO of O'Reilly Media] takes it further: "In a brilliant Copernican stroke, [Google's] gmail turns everything on its head, rejecting the personal computer as the center of the computing universe, instead recognizing that applications revolve around the network as the planets revolve around the Sun. But Google and gmail go even further, making the network itself disappear into the universal virtual computer, the internet as operating system."
I'm not sure. Sounds a lot like the hype surrounding Netscape in the late 1990's. But, if the Windows desktop faces even the remote possibility of being threatened, I can see why Microsoft is responding so aggressively.

Friday, July 30, 2004

ipo.google.com

Interesting tidbits from the transcript on the only recently available IPO site on Google.

Larry Page (Founder) on search:
    You can find a lot of things you're interested in using Google, but we hope to make it much, much better over time, and to really understand the query that you type, to really understand all the information that's available, and while there's a lot of information on the web, not all the information in the world is currently on the web, and so making more information available, understanding it better, understanding what you want better, are all important goals for us ... You'll have an easier time finding the information you're interested in.

Eric Schmidt (CEO) on Adwords:
    Through Google AdWords, advertisers are able to deliver relevant ads cost-effectively to Internet users. The businesses these advertisers are building with Google, as a result of effective targeting, are changing the way advertising works and the way advertisers approach their markets. For example, we don't help advertisers find 24 to 36 year old males. Instead, we help them find consumers interested in purchasing flat panel televisions.

    Unlike the earlier Internet advertising efforts, we didn't just show any ad along with the search. We used special technology invented by Google, to take a search term and figure out which ads were most likely to be relevant. Whereas people tend to ignore untargeted ads, we found that people actually like these ads because they provide additional, relevant information ... That's really the secret of why the model has worked so well for us. We found a way to make advertising useful, not annoying.

George Reyes (CFO) on AdSense:
    AdSense for content focuses on serving targeted and relevant ads based upon the content that the user is reading. Similar to AdSense for search, we get paid each time a user clicks on an ad, and we share the majority of that revenue with our advertising network member. Our AdSense program has become our largest source of revenue, generating about 50% of our revenues through the first half of 2004, up from 24% in 2002.

To summarize, Larry says search is all about relevance (understanding the query and the data to help you find what you need). Eric says advertising is all about relevance (that advertising should be useful and relevant, not obtrusive and annoying). And George confirms my prediction that AdSense (ads placed on other websites) is the key to Google's revenue growth.

So, where does personalization fit into all this? Personalization is all about relevance, recognizing that what is relevant to me isn't the same thing that is relevant to everyone else. If Larry and Eric want relevance, they'll be looking to personalization.

Microsoft's IP strategy

The NYT reports that Microsoft is more aggressively pursuing patents:
    Microsoft said on Thursday that it planned to increase its storehouse of intellectual property by filing 50 percent more patent applications over the next year than in the previous 12 months.
Some are concerned about how these patents might be used. From the NYT article:
    Microsoft, the world's largest software company, increasingly regards the legal protection of its programming ideas as essential to safeguarding its growth opportunities ... Microsoft's stepped-up patent program, analysts say, will be watched closely in the industry to see if the company uses it mainly as a defensive tactic or as an offensive weapon to try to slow the spread of open source products.
And from a CNet article:
    Hewlett-Packard on Tuesday sought to distance itself from a June 2002 memo in which an HP executive said Microsoft planned to use patents as the basis for a legal attack on open-source software.

    "Basically, Microsoft is going to use the legal system to shut down open-source software," said Gary Campbell, then vice president of strategic architecture in HP's office of the chief technology officer, in a memo to several HP executives. "Microsoft could attack open-source software for patent infringements against (computer makers), Linux distributors, and, least likely, open-source developers."

Thursday, July 29, 2004

Microsoft "multisearch"

Ina Fried at CNet reports on Microsoft's blend of desktop and web search:
    Microsoft revealed the progress it has made in building search technology on Thursday when it demonstrated a tool that can comb both the Internet and a PC's hard drive. The technology is designed to quickly look through a hard drive, finding all the matches for a word from within documents, e-mails and even e-mail attachments. The version [Yusuf] Mehdi presented also returned Web results on the right side of the page.
Sounds like this is more than just Lookout.

[Thanks, Gary, for pointing out the article.]

Wednesday, July 28, 2004

Why personalized news?

Why might a personalized news site be more interesting and useful than a manually edited news site?

The problem with a manually edited front page is that everyone sees the same thing. While some top stories about big events are important for everyone to see, picking news stories otherwise is an effort to appease some mishmash of the interests of all readers. It's a compromise that results in mediocrity.

Personalized news focuses you in on the news that is important to you. Important top stories will still appear, but the site also surfaces stories that are important just to you. Have an interest in Linux? Findory News will learn that interest and emphasize news related to Linux. Never interested in sports? Findory News will adapt and deemphasize sports stories.

Personalized news provides a different front page to every reader. It focuses on your interests in a way that is impossible to reproduce by manually editing a page. By uncovering interesting articles from thousands of news sources, it will help you discover news you otherwise would have missed.

Try it out! You might not realize what news you're missing every day.

Tuesday, July 27, 2004

MSN Newsbot review

Microsoft has launched their personalized news product in the US. How does it compare to Findory News?

Using MSN Newsbot provides enough data to do some educated speculation about the system. MSN Newsbot does instantly keep track of articles read and use them immediately to change the small box of "Personalized News" headlines in the upper right of the front page. Aside from the small box of personalized headlines, the rest of the page appears to be unpersonalized. Some articles I read did not appear to be recorded. For example, I did a search on "newsbot" and clicked on three articles, only to find that none of the articles were recorded in my history or used for personalization. I also managed to lose my entire history once for no reason I could discover.

It's difficult to determine the underlying algorithms from inspecting the behavior of the site, but there appears to be strong evidence that it is mostly based on subject categories. Reading an article on business in Korea caused top headlines from Asia to appear. Surprisingly, even after deleting the article from my history, Asia top stories continued to be selected. Clicking on the "Why?" link gave the explanation that the personalized stories were picked because they are from the category "Asia-Pacific Latest".

Similarly, reading an article on Google's IPO caused the personalization to show me more top headlines from "Business:General" and "Business:Financial". Reading a science article on the effects of caffeine just produced more general science articles.

Using subject-based profiles is a well-known method of doing personalization, but it also has well known problems. In particular, the personalization is not specific -- for example, showing just general business headlines -- and tends to pigeonhole people -- showing a reader only business stories and not picking up other cross category interests. While it does have the advantage of being simple, experience in my past life shows that the predictive accuracy of this method is an order of magnitude lower than more fine-grained personalization techniques.

If it is true that MSN Newsbot is merely using subject classifications for its personalization, Findory's personalization technology is considerably more advanced. Findory's algorithms combine statistical analysis of the article text and of users who viewed the articles with information about articles you previously viewed. Our personalized news is fine-grained. Our personalization is targeted closely to your interests while maintaining enough serendipity to enhance discovery. We help you read the news more efficiently and find articles you otherwise would miss. There is still nothing else like it out there.

Monday, July 26, 2004

MSN Newsbot launched

Chris Sherman reports that Microsoft's personalized news site, MSN Newsbot, has launched in the US. Interesting to see how it compares to Findory News.

Update: The press release from Microsoft.

Update: Press coverage of the launch seems to be focusing on MSN Newsbot as a "Google News killer". Certainly true that this represents a new front in the search war between Google, MSN, and Yahoo.

Update: Some good coverage on Poynter E-Media Tidbits and ResourceShelf.

Re-recruiting your knowledge nomads

Niall Kennedy comments on an excellent HBS article, "High Turnover: Should You Care?". Some excerpts from the article on commitment and productivity:
    Length of time in a company is the most common way of measuring employee commitment. And it is the least interesting and least helpful approach for managers. Far more important is the quality and quantity of work someone does when in a company.
And on retention and job satisfaction:
    Employers do often look at turnover as only a bad thing, [but] research has demonstrated that some turnover is healthy, indeed essential to organizational well being. But even more important is what managers do when they see turnover as a bad thing. To drive it down they often employ the wrong remedies.

    Managers who were asked to identify ways to retain workers came back with action steps like "increase salary" and "change his or her title." These are small changes [that] may keep an employee in a company for a couple of months, but they will not hold an employee for long, and little productivity will be gained. The managers we asked to identify ways to elicit commitment proposed deeper and more individualized action steps, like "find out what challenges make him or her tick" and "provide opportunities for learning on the job." ... Most employees seek to be valued and engaged.
Turnover is a symptom, not the problem. Keep people challenged, learning, and happy. It's good advice.

A lonely and difficult road

Warren Schultz at CareerJournal has a good reality check for aspiring entrepreneurs in his latest column, Entrepreneurship Is Often A Lonely and Difficult Road. While I think he underestimates the benefits of starting your own business -- my own experience has been fantastic -- Warren is right to warn of the costs and risks.

Sunday, July 25, 2004

Technorati's priorities

Steve Rubel criticizes Technorati for emphasizing marketing over engineering:
    I love Technorati, but this smells like dot com spirit all over again. Where's the moolah coming from to support a PR team of five? Hiring a PR firm before you can handle demand and squash bugs is looking for trouble. Hope they are ready for all the added attention.
I've personally had some problems with the performance, stability, and completeness of Technorati's search. I'm hopeful that Adam Hertz, Technorati's new VP of Engineering, will be addressing these issues quickly.

Update: After being frustrated by slow and failed Technorati searches again today, I think I'm starting to agree with Jason Calacanis.

Update: David Sifry (CEO of Technorati) acknowledges and apologizes for their scaling problems and outages.

Saturday, July 24, 2004

BugMeNot "registration form"

BugMeNot, a site I've mentioned ([1] [2]) that allows people to bypass mandatory registration requirements on websites, has an amusing joke registration form. As Cory Doctorow says, it cleverly "exemplifies many of the critical problems with registration on the Web."

Friday, July 23, 2004

Findory mentioned on Poynter

Steve Outing writes about Findory News:
    Personalizing news websites is a nice idea, but existing attempts I've found to be less than the ideal. Most require user registration, and then you can specify which sections (Business, Sports, etc.) you want on your home page. But there's another way, as demonstrated by Findory.com, a news site that debuted in March. It's a news aggregator (like Google News), and it's personalized (like My Yahoo!). The cool part about Findory is that whenever you click on an article link (which brings up the story from a news site, just like any other news portal) the site remembers what you've read and decides what other stories you might be interested in based on your previous clicks. It learns your interests throughout your current browsing session and subsequent ones, and presents headlines based on your past clicking behavior. (For example, I clicked on a Tour de France article; when I returned to the homepage of Findory.com, the article ranking had immediately included more bicycling stories out front.)
Steve Outing is a senior editor at the Poynter Institute for Media Studies, and an interactive media columnist for Editor and Publisher, a journal on the newspaper industry.

Thursday, July 22, 2004

Slashdot moderation and reputation

I've been frustrated with Slashdot moderation lately. The current system doesn't properly emphasize the most informative and interesting comments.

While the suggestions in Slash(dot) and Burn may help, I'm convinced that more is needed. In particular, I think the comment and moderation system needs to do more with reputation.

Currently, Slashdot has a simple reputation system called karma. Users with karma over a threshold have a higher initial score on their comments. High karma users also are occasionally given a few moderation points to raise or lower the score of other comments by +-1. Comments have a score in the range [-1, 5] and users can elect to filter all comments below a threshold.

But why these thresholds? Why not have everything work as a function of karma? For example, the starting score of a comment from user with low karma could be 1.15, from a medium karma user 1.31, and very high karma 2.38. Allow almost all users with positive karma could do at least some moderating, but moderations from a low karma user should barely nudge the score, perhaps by as little as +- 0.04, while a moderation from a very high karma user should move the score by +- 1.33.

Since high karma users mostly get their karma from posting interesting comments, giving high karma users more influence should improve the quality of discussions. In addition, using the full range [-1,5] instead of only integer values will allow more subtle differentiation between comments.

What do you think? Would this work? Are there other ways Slashdot moderation could be improved?

Wednesday, July 21, 2004

Growth and the future at Yahoo Search

Some tidbits on Search Engine Watch about Yahoo Search:
    With the release of its new search engine, Yahoo now powers over half of the US web searches - this is a dramatic shift in the market share within the industry. Yahoo now has 260 million users world wide and 100 million registered users.

    Yahoo sees personalized search as the future focus. The goal of personalization is to better understand the user intent. Currently, people have to type in extra words in their query to be more specific to get the results they want. With Personalized search, the search engine delivers relevant results with fewer words. For example, if people want a haircut and Yahoo knows that they live in midtown New York, Yahoo would be able to automatically supply haircutters in that area.
Yahoo's large registered user base is an advantage for personalization. Signed-in customers will not lose all their data if they lose their cookie. Signed-in users can see the same personalization from multiple computers. And many users may have provided demographic data and information about their interests.

The new command line

"The Location Field is the New Command Line" is a cleverly titled essay on web applications (like Google's GMail) and the disruptive impact they could have on Microsoft Windows.

Tuesday, July 20, 2004

Registration required

Rachel Metz at Wired writes about the annoyance of registration requirements at news web sites and tools like BugMeNot that thwart them. The article contains interesting quotes from people in the industry on why they require registration:
    Elaine Zinngrabe, general manager of latimesinteractive, which runs the Los Angeles Times' website, said the newspaper began requiring online user registration in June 2002 as a way to learn more about its readers and, it hopes, to drum up more advertising on the site. The Times asks readers to reveal things like their ZIP code, age, gender and income.

    Dipik Rai, a business manager with Knight Ridder Digital who runs online registration for some of the company's newspapers ... said Knight Ridder Digital is very upfront about why it's gathering data, which it uses to figure out who's using its site and to target advertising.
Both Elaine and Dipik claim that the reason they need registration is to understand their audience and for online advertising. There are three issues with this:
  1. Random sampling: Acquiring a demographic profile of your audience only requires a random sample of your customers, not forcing every single reader to fill out a lengthy form.
  2. Effective targeted advertising: I suspect a web site can generate higher targeted online advertising clickthroughs and revenue by matching ads to user behavior (the articles the reader has read) than from noisy and coarse-grained data on age, zip code, and income.
  3. Cost of registration: Required registration repels some visitors from your website, so you lose traffic and advertising dollars. The cost of lost revenue needs to be part of the cost-benefit analysis of required registration.
Read more:

Monday, July 19, 2004

They think they can

An interesting article on innovation in smaller search engines. Mentions of Topix.net, Find.com, Vivisimo, and Eurekster, plus a few interesting comments from John Battelle.

Sadly, no mention of Findory News or Findory Blogory.

Netflix

Two interesting articles on Netflix this morning:

Phillip Torrone on Engadget argues that Netflix should provide open APIs to its system like Google and Amazon do.

Hacking Netflix talks about Blockbuster's leaked beta test of a Netflix imitator.

My biggest issue with Netflix is that their recommendations don't seem very effective. They're obviously trying to bias the recommendations heavily to the less frequently rented items in the back catalog, but they bias so heavily that the recommendations are never interesting to me.

It's a real shame. If the recommendations were any good, I might be reluctant to switch to a competitor, since I'd lose my nifty recommendations by making the switch. As it is, I don't have much reason to stick with Netflix, especially as they raise prices and appear to be having increasing problems with availability.

Saturday, July 17, 2004

Forrester Research picks Yahoo

Forrester Research predicts that use of Yahoo Search will exceed Google in Q1 2005. George Colony, Forrester chair and CEO points out that:
    Great search site has three components: personalization, presentation, and quality of service. Of all the search engines out there, Yahoo is the only player that gets all three.
I'm not sure I agree that Yahoo is as good as Google on presentation or quality of service. But, Google's attempt so far at personalization has been weak. Yahoo clearly considers ([1] [2]) personalization to be the way for them to compete.

Friday, July 16, 2004

InfoWorld on RSS growing pains

Chad Dickerson at InfoWorld talks about scalability issues with RSS:
    InfoWorld.com sees a massive surge of RSS newsreader activity at the top of every hour, presumably because most people configure their newsreaders to wake up at that time to pull their feeds. If I didn't know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs. Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues.
I commented earlier on a Wired article that expressed similar concerns about the scalability of the polling architecture of RSS.

There's a variety of ways to deal with this issue. The solution Chad seems to be suggesting is to randomize request times so that there aren't big spikes in traffic every hour at the hour. That's certainly a good idea. Clients should also respect the ttl (polling at the interval that is listed in the feed), support conditional GET, and handle 304 (not modified) responses to minimize the number of requests they make for the full feed.

But the primary solution will end up being caching. With the exception of personalized RSS feeds, RSS feeds easily can be cached. Web-based RSS readers like Bloglines and My Yahoo already only read the RSS feed once, cache it, and display it to multiple readers. But popular RSS feeds are also easily proxy cached just like web pages, reducing the load on the original source servers.

Thanks, RSS Weblog, for pointing out the InfoWorld article.

Update: An interesting discussion of RSS scaling over on Jeremy Zawodny's blog.

Update: Chad Dickerson follows up with an article summarizing all the suggestions he received. It's mostly the same as my suggestions above, but still probably worth skimming.

BusinessWeek on Info-Overload

Steve Hamm at BusinessWeek writes about the future of search. He starts by slamming MSN's new search engine:
    For proof that Internet search technology is still in its infancy, go to the Web site for Microsoft's new would-be Google killer. The company has invited people to try out a test version and suggest improvements. It could definitely use some. A search for Web pages related to the key words "Bill" and "Gates," for instance, starts off well with a couple of links to Gates' own Web pages on Microsoft's corporate site. But the results quickly devolve into a catch-all of random and even loony Gates-related material -- including a guide to hiring a William H. Gates III impersonator and a tongue-in-cheek spoof that tries to prove Gates is the devil.

    That's a bummer. Microsoft has spent $100 million on the new technology -- its attempt to catch up with Google's great leap forward in Web search. Clearly, a lot more has to be done.
The problem, as Steve explains well, is that "there's still too much gravel among the nuggets of digital gold."
    The burden is on the industry to turn the information tsunami into an unadulterated good. Until its engineers create much better ways of sorting through the sea of digital information that's flooding people's lives, consumers won't be able to get the most out of the Internet, and corporations won't receive full value for the nearly $8 trillion they have spent on info tech over the past decade. "We're just scratching the surface of what needs to be done," says Howard D. Wactlar, vice-provost for research computing at Carnegie Mellon University. "There's just so much information and so much of it isn't useful."

Microsoft acquires desktop search company

It's being widely reported that Microsoft has acquired a desktop search company called Lookout. Lookout's product adds a search bar to MS Outlook to allow easy searches of e-mail and files on the computer.

The Seattle PI predicted:
    Microsoft's acquisition of Lookout Software reflects the fact that next-generation search engines are expected to focus on "unified search" -- encompassing not only data from the World Wide Web but also other sources of information, such as files stored on a hard drive. Microsoft has said it is designing its MSN Search program as a single system to search across a variety of data types, including PCs, databases and corporate intranets.
This acquisition is a bit unusual because of the size of the company. Lookout is a two-person company with one freeware product and only a few thousand users, according to the Seattle Times.

Update: Lookout apparently uses the Lucene open-source search search engine. This has lead some to question what technology was acquired by Microsoft in this deal. By the way, it's interesting to see that other users of Lucene include Furl, Nutch, and Seruku.

Thursday, July 15, 2004

Security issues at Friendster

The latest Crypto-Gram newsletter points out a Wired article that documents some serious security holes in Friendster.
    Moore has written several Unix shell scripts that run on-the-fly background checks on people who use wireless networks in his neighborhood. With the help of the popular network-traffic analysis utility Netcat, his script "sniffs all the traffic on the Wi-Fi network, greps for email addresses, and looks them up on Friendster." Then the script sends Moore an email that includes a link to the users' Friendster profiles, along with their pictures and login IDs.

    At a time when it seems that nearly everyone has a Friendster account, Moore says, "You can do really creepy stuff. You can get the profiles on everyone in your local café, then see who their friends are, and just walk up to them and ask, 'Aren't you Tom's friend?'" More disturbing, Moore's toolkit allows him to get zip codes and last names, making it easier to track down the real-world addresses of his targets, thus opening up a whole new universe of creepiness. "You could do all sorts of mean things," he says.

    [Another trick] mines for information about anyone who looks at his profile and clicks through to his Web site. "I get their user ID, email address, age, plus their full name. Neither their full name nor their email is ever supposed to be revealed."

    Notified of the security holes, Friendster rep Lisa Kopp insists, "We have a policy that we are not being hacked." When I explain that, policy or no, they are being hacked, she says, "Security isn't a priority for us. We're mostly focused on making the site go faster."
"We have a policy that we are not being hacked." Wow. What more is there to say? Wow.

Default search in IE

The Google Toolbar now can override keyword search in the IE address bar, using Google instead of MSN Search.

Defaults have always been a big advantage to MSN Search. Windows and IE have dominant market share. Many novice PC users apparently don't even know what search they use; they just type keywords in address bar and search, so they use MSN Search by default.

But now, the Google toolbar makes it easy to change the IE defaults to use Google instead.

Wednesday, July 14, 2004

Newspapers online and offline

Adam Penenberg at Wired has a recent article that starts by talking about how the New York Times articles and archives are poorly represented in web search engines like Google. But the article later moves into a discussion of online news in general and what traditional newspapers should think about for their news websites.

One issue is that online readers are much more numerous but much less dedicated than print subscription readers.
    The Times attracts 9 million unique visitors a month, while only about 1 million read the daily paper. On average, visitors spend about 43 minutes a month on the website, according to Nielsen/NetRatings, which gives them just enough time to sample a story or two that interests them. That's a fraction of the 28.2 minutes per day (3.4 times per week) a typical reader spends on the morning newspaper.
There are huge differences between the behavior of your online and offline readers. News websites should optimize for more casual and ephemeral online visits by taking advantage of the online format. Specifically:
    The Times should customize its content so that readers could pick and choose which stories they want based on their own particular interests, rather than having to wade through the site's table of contents.
What is being suggested here is personalized news. Take advantage of the online media format. Customize each page to each reader's interests.

Because each reader spends so little time online (1.5 minutes/day online vs. 28.2 minutes/day offline), online readers generate much less revenue for newspapers ($11/user online vs. $900/user offline). By responding dynamically to the needs and interests of their customers, online news sites can increase the amount of time their readers spend on their site and increase advertising revenue.

Tuesday, July 13, 2004

Top online news sites

CyberJournalist.net posted the Nielsen/NetRatings of the "Top 20 Online Current Events & Global News" sites. CNN, Yahoo News, and MSNBC are dominant with Google News at about 30% the level of the market leaders.

Google vs. oFoto

Google acquired Picasa recently, signaling their entry into digital photo storage. Google's systems are designed to store large amounts of data across their cluster, so this business seems like a nice fit for them.

John Battelle speculates that travel will be next, but I have my doubts. The travel business is nasty, requiring interfacing with complicated legacy systems like the SABRE database. It's not clear to me that Google has any expertise or competitive advantage in this area.

Monday, July 12, 2004

Innovative recruiting from Google

A billboard on 101 near Palo Alto says nothing but:
    { First 10 digit prime in consecutive digits of e }.com
As described by Rupert Goodwins at ZDNet UK, solving this puzzle lead to another harder puzzle, then a page for jobs at Google.

Cute idea. Clearly it grabbed attention. I'm not sure the scheme will actually meet the goal of generating high quality resumes though. First, the selection criteria of solving these kinds of puzzles doesn't seem like a strong match with finding smart, talented people who will succeed at Google. Second, now that the final page is public knowledge, the filter, such as it was, is gone.

Nevertheless, you have to admire the creativity.

News.com Extra: Filtered by humans

CNet recently launched a general news site called News.com Extra. The tagline is "The Web filtered by humans, not bots," an amusing dig at news aggregators like Google News.

Of course, the tagline is absurd, implying equivalence where there is none. It's simply not possible to manually filter the hundreds of thousands of articles spidered daily by a news aggregator like Google News. Where an automated process can take advantage of the full breadth of news sources available, human editors will only have time to review and choose from a negligible fraction of the content available. The two are in no way equivalent products.

Not to mention that, if you want a news front page personalized to your interests, it can't be done using human editors.

Sunday, July 11, 2004

Microsoft cuts benefits

Microsoft recently cut employee benefits and reactions from employees have been negative.

The Seattle Times quotes one person as saying, "Microsoft's benefits used to somewhat make up for what is a difficult place to work. Are we now going in the direction that it will be both difficult and unrewarding?" The article goes on to summarize the comments by saying, "The comments have one recurring theme: The cuts are leading to a drop in morale."

Cutting benefits can be dangerous. Typically, with any salary or benefit cut, your best employees (who have the best job prospects) leave at a disproportionate rate. It almost always has a negative impact on morale and productivity.

Moreover, benefits often are valued by employees at a level beyond the pure monetary value. One of the more interesting books I've read on employee compensation, Strategic Human Resources, makes this point:
    Benefits and perks can also be particularly powerful symbols of gift exchange, moving the employment relationship from one with purely economic connotations to something more along the lines of a kin or friendship relationship. Salary, wages, and even bonus payments all have the connotation of an economic exchange in which each party should attempt to extract the best possible (narrowly selfish) deal. Some forms of benefits and perks are of an entirely different flavor and can cause the worker to respond with reciprocal gifts or by internalizing the welfare of the organization.

    The psychological leverage associated with providing benefits is likely to depend on whether the employer is a pioneer in providing this perquisite or instead simply seen to be matching the competition.
It's interesting to note that Google offers a particularly exceptional benefits package.

Update: Well, look at that. Fourteen months later, BusinessWeek blames low morale and loss of key people, in part, on these benefit cuts.

The polarization of news

This week's Media Myopia column points to the increasing polarization of news. People tend to trust and read news sources that agree with their political beliefs. The article notes that:
    Perceptions of "media credibility" - that is, whether people think a particular news outlet can be trusted - are now more driven by ideology and partisanship than at any point in nearly 20 years of surveys.
I would think that the best way to mitigate this issue would be to use a news aggregator like Google News or a personalized news aggregator like Findory News so you see your news from a large number of sources and viewpoints. But I was surprised to see the column cite the book Republic.com to argue against personalized news:
    The Internet's ability to provide personalized news - to permit users to filter out those things they don't care about - [poses] a threat to democracy itself. Democracy depends in part on people's being exposed to information they would not necessarily have chosen for themselves. So, too, might the concept of gut rationality be endangered in a filtered world, where people see only what they want to see, hear only what they want to hear, read only what they want to read.
The author has a fundamental misunderstanding of the objective of personalized news. Personalized news does not seek to show a reader "only what they want to see." It does not pigeonhole a reader to one particular viewpoint. Rather, it helps the reader discover sources and articles that otherwise might have been missed.

While you may be a regular reader of the New York Times, Wall Street Journal, a news website, or your local paper, even the biggest news junkie only reads a few sources every day. Using Findory News, you now have a window into thousands of news sources you don't normally see, different viewpoints, different perceptions, different analyses. Rather than limiting your perspective, personalized news broadens your view, helping you find articles you never would have discovered on your own.

Friday, July 09, 2004

AltaVista code stolen by MSN Search employee?

Todd Bishop at the Seattle PI has the scoop.

While the prosecutor on the case said that the allegations currently "do not pertain to Microsoft," it's not known if any AltaVista source code ended up in the new MSN Search engine. AltaVista, by the way, is now owned by Yahoo.

The search wars keep getting more and more interesting.

Thursday, July 08, 2004

NYT review of the new MSN Search

The NYT State of the Art column has a lukewarm review of MSN Search's UI redesign and their new search engine. Some interesting snippets, starting with the MSN Search redesign:
    MSN [Search] still relies on search technology licensed from Yahoo. For many searches, the results are pretty much the same at Google. But do enough searches, spend enough months, and Google gradually re-earns its reputation for superior accuracy. The new MSN Search looks like Google but doesn't work like Google.
On paid placement:
    Unfortunately, Microsoft calls the separation of advertising an experiment, not a permanent change in policy. It seems to be trying on honesty in the mirror to see if people will find it attractive, rather than realizing that running a principled business is the way to win customers' trust. In short, "MSN will continue to evaluate the potential of paid inclusion to improve relevancy."
And, finally, a harsh review of Microsoft's early attempt to build a new search engine:
    You can try out Microsoft's extremely early version of this new search algorithm at techpreview.search.msn.com, but don't get your hopes up. First, the tech preview is very, very slow. Second, it should be more discerning; if you search for "motorized draperies," 13 of the first 15 results all come from the same company's Web site. Third, kill the sense of humor: the first result in a search for "1963 Oscar winners" is "1998 Oscar winners." (The actual 1963 winners don't even appear on the first page of 15 links.) In fact, on many searches that Google and Yahoo aced - "Richard Nixon's dog," "fertility over 40" and "Britney wedding photos," for example - the tech preview's first results page comes up empty-handed.
See also my earlier comments on the new MSN Search.

Wednesday, July 07, 2004

GMail having scaling trouble?

Some reports ([1] [2] [3] [4]) of trouble using GMail lately. Perhaps they've been a little too quick to give out those invites?

Slash(dot) and Burn

A paper by Cliff Lampe and Paul Resnick called "Slash(dot) and Burn: Distributed Moderation in a Large Online Conversation Space" has a fascinating analysis of data from Slashdot. Well worth reading. I particularly liked the table computing correlations between characteristics of a comment and the final moderation score of that comment.

Near the end of the paper, they propose some intriguing changes to Slashdot to address their primary concern, timeliness of the moderations:
    Alternative designs might cause treasures to be discovered more quickly and consistently, at the expense of a little more moderator effort. For example, there could be a special moderator's view of a conversation. It would hide comments below certain thresholds, as with the view presented to other readers. But comments the system had flagged as needing additional moderator attention would not be hidden. Recently posted comments and those with recent moderation would be flagged. Once a flagged comment had been presented to enough moderators, the system would infer from the lack of any explicit moderator action that the item was correctly classified and stop highlighting it for future moderators. All comments would reach their final score much faster, and the problems of uncorrected moderation errors and buried treasures would be reduced significantly.

Tuesday, July 06, 2004

VP at Yahoo on personalized search

David Mandelbrot, Yahoo’s vice president of search content, recently had this to say about long-term plans for Yahoo Search:
    Down the road we are focused on personalized search. We have a team of engineers working on discovering user intentions when they do a search. We’re looking at folders and clustering and other personalization techniques. Relevancy and freshness are huge priorities for us and our technology does a much better job in those areas as well as in comprehensiveness.

Monday, July 05, 2004

Blogory RSS feeds

Findory Blogory just launched a new and unusual feature: personalized, aggregated, and adaptive RSS feeds.

The RSS feed is personalized, so every reader gets a unique feed with articles that match his or her interests. The feed aggregates weblog articles from thousands of other weblogs, combining all the weblogs into one RSS stream and helping readers find and discover new articles and new weblogs. And Blogory's feed is adaptive, learning your interests as you read articles from the feed and changing the RSS feed to more closely match your interests.

To my knowledge, this is the first personalized, adaptive RSS feed. There's nothing else like it out there.

Many people might want to add it to their weblog reader as an easy way to read weblogs. For people who are overwhelmed with tens or hundreds of RSS feeds listed in their reader, Blogory might be a way of cutting through the glut, helping them find what they need each day without having to search manually through a long list of feeds. For others, Blogory may be a way of discovering interesting new weblogs.

Friday, July 02, 2004

Blogory in UK Guardian

Findory Blogory got a brief mention in the UK Guardian today. Interesting that the author of the article is Tara Calishain, lead author of the book Google Hacks and editor of the popular weblog ResearchBuzz.

Thursday, July 01, 2004

Text-only option from Google cache

Google Blogscoped noticed that Google now allows you to see a text-only version of cached pages. For example, try the normal and text-only cached versions of AnandTech. If a site is down or very slow, this is a convenient feature.

This reminds me of a prototype that I wrote a few years ago that allowed people to browse the web entirely using the Google cache. All URLs were rewritten on the fly to point back to the Google cache, so, once you started browsing the web using the tool, you stayed on Google cache for all your web browsing. It was a cute idea and worked fairly well, but I didn't pursue it beyond the prototype.

While you're at Google Blogscoped, check out their little EgoBot toy. I love the answer to "What is... personalization?"

I'm curious how EgoBot is implemented. Again, a couple years ago, I had a prototype called "The Oracle of Google" that took a question, executed a query against Google, and them did some simple natural language processing to try to extract likely answers from the blurbs in the Google results. It worked surprisingly well. I'm guessing this does something similar.

If you're interested in question answering using search, there's some research work that might be useful to you. One of my favorite papers is by Cwok, Weld, and Etzioni. It's called "Scaling Question Answering to the Web" and it describes a more principled approach to this problem, leveraging WordNet and some other publicly available tools for the natural language processing. They had problems with scalability -- the natural language processing was quite expensive -- but it's a fascinating approach to the problem.

Update: Microsoft Research has been doing a fair amount of work on question answering. Here's a paper I found particularly interesting.

New MSN Search launched

It's being widely reported (Yahoo News, FT, and more) that MSN has launched its new search engine.

The best write-up I've seen is by Danny Sullivan at SearchEngineWatch. John Battelle also has a write-up focusing on paid inclusion.

If you want to try MSN's new search, it's available in the MSN Sandbox. MSN Search isn't expected to be powered by this search engine until into 2005. As Danny Sullivan says:
    The new Microsoft search engine is NOT -- NOT NOT NOT -- being used at MSN Search. It can be confusing, because along with the search technology announcement, Microsoft has also announced a new look and feel for its MSN Search site. Despite these cosmetic changes, under the hood, MSN Search itself still beats with a Yahoo heart.
Reviews of MSN's new effort so far have been lukewarm at best. Danny Sullivan had this to say:
    The new search engine also leaves me with a "more of the same" feeling. It doesn't take search results anything beyond what Yahoo, Google or Ask Jeeves already do, and given their maturity, do better. In fact, a fast run of the tested queries through Gigablast -- a one man effort by Matt Wells -- makes you think MSN still needs to catch up to even that service.
But, to be fair, it's early. MSN Search may be an also-ran right now, but they've got a massive development effort behind them and intend to try to differentiate themselves soon. And it appears that personalized search will be their primary means of differentiation. From the MSN's message about their commitment to search:
    A Personalized Experience. Your Search service should learn from you. What you like, what you read, where you live. Search should deliver results that are more personal and relevant to you.