|Front Page Previous Story Next Story||
Now 'Powered by Google'|
NIH Changes Search Engine, Improves Relevance of Results
By Carla Garnett
On the Front Page...
Ever been looking for something specific online, typed a term into a search engine and received thousands of irrelevant "hits?" Those results would then require hours of sorting and sifting often in vain for the original item. As any Internet user will attest, the best thing about having so much data at your fingertips can also be the worst thing. It's an all-too-common dilemma shared by people seeking information as well as those providing it: How to effectively narrow the World Wide Web.
Recently the team that maintains the search function for NIH's web site put several finder products to a relevancy test. The result is that NIH's online searches are now "powered by Google," arguably one of the most popular tools on the net.
"In the early days, search products were more or less comparable to each other," explains Dennis Rodrigues, chief of the Online Information Branch in the NIH Office of Communications and Public Liaison, which has primary responsibility for the main NIH web site. "One product produced results pretty much as well as another. As search technology became better over the years, our expectations grew and the bar became higher. Over time, Google emerged as a far superior product."
With so many search products on the market, determining which one was best for the NIH community could have posed a problem. However, Rodrigues, who serves as the gatekeeper for data placed on the main site, found that there was really no contest between best-known products.
"The Google Corporation set up a test for us," he said. "I used a battery of about 25 terms, looking for the ideal result. For instance, if I typed in 'melanoma,' what pages would be listed first? What would be among the top 10 results? We also looked into Inktomi's search product, which runs on the firstgov.gov web site. We thought we might be able to save money if we piggybacked on their use agreement. [However], we found that Google returns more relevant results for NIH's needs. It was the complete winner in every race we had."
According to Ginny Vinton, home page technical coordinator at NIH's Center for Information Technology and head of the team that keeps the NIH search engine in operation, there are more than 200 servers for the 242,000 documents that require indexing on the NIH site. Deciding to change the tool used to locate these items is no small undertaking.
On any given day, upwards of 19,000 searches are conducted on NIH's site, Vinton reports. The days logging the most searches are Tuesday through Thursdays. NIH can trace a significant amount of its traffic to visitors who use global search services like Google or Yahoo.
"We had been thinking about various products for quite a while," Rodrigues admits, explaining that the search engine NIH had used for several years had begun to show its age.
In addition, the 3-person CIT technical team which along with Vinton includes George Cushing and Bing Chao that tends to the main NIH site sought a product that would be responsive to the questions and concerns of clients.
"I realized we should make the switch one day when I called the team and realized they were all already using Google to search the web," Rodrigues recalls, explaining that the search engine is "primarily to assist those using our public sites."
NIH launched its Google package on Feb. 9. The use agreement includes a back-up appliance for emergencies.
"We want to have a product ready to take over if the first one fails for any reason," explains Vinton. Both the primary and backup appliances are indexed once a week.
Another benefit to Google is that selected pages can be elevated in relevancy with relative ease. As the point of contact when people are unhappy with the NIH site, Rodrigues says that one of the complaints heard most often from NIH'ers was that they had conducted a search to see if their site came up on the return list. Frequently, because the word or title they were searching for was not recognized by the search engine, their site would not, in fact, be listed or would be so far down on the relevancy list that people looking for it would give up before locating the information.
"For instance, if the words on the page are in the form of graphics, a search engine will miss them," explains Rodrigues, who would then consult with Vinton and Cushing in an effort to find a solution that might improve the search results for that particular page. Because fruitless searches were beginning to occur with regularity, the troubleshooting process was becoming ever more time-consuming for team members, each of whom has other duties.
"We could adjust the algorithms so that additional weight was added to a title, keywords or a body of text," Vinton says, "but we never got the relevancy we desired."
Over time, Rodrigues adds, web authors who create pages with search engines in mind will be pleased with Google's ability to rank their pages in ways that offer the most benefit to users.
"One of the things we learned with the previous engine was that it didn't always follow convention," Rodrigues says. "Often it was counterintuitive to the way people would use it. We wanted a product that uses natural language to come up with reasonable results. Another consideration we had was that the product have an effective technology so that we could create web pages that work with it."
As web technology continues to develop at an exponential pace according to recent tech news, Google already has a new rival in the search field, Grokker the next dilemma for Rodrigues becomes how long NIH sticks with Google.
"Our goal is to find solutions that are reliable, robust and provide the best possible results for our users," he concludes. "Our next move depends on how long Google can meet the needs of our customers."
Up to Top