The work of search engines: general principles of search engines. How search engines work - snippets, reverse search algorithm, page indexing and features of Yandex. How the search engine works.

Good afternoon, dear readers of my SEO blog. . This article is about how it works search engine Yandex what technologies and algorithms it uses to rank sites, and what it does to prepare a response to users. Many people know that this flagship of Russian search sets the tone in Runet, owns the largest database in Eurasia, handles the content of more than a billion pages, and knows the answer to any question. According to Liveinternet data for August 2012, Yandex's share in Russia is 60.5%. The monthly audience of the portal is 48.9 million people. But the most important thing for us bloggers is how the search engine receives our requests, how it processes them and what the result is as a result. On the one hand, knowing and understanding this information makes it easier for us to use all Yandex resources; on the other hand, it is easier to promote our blogs. Therefore, I propose to look with me at the most important technologies the best search engine on the Runet.

When an Internet user first wants to turn to a search engine for information, he may have one question: “How does the search work?” But when he receives it, this question often changes to another: “Why so quickly?” And really, why does searching for a file on a computer take 20 seconds, and the result of a request from an entire network of computers around the world appears in a second? The most interesting thing is that the first two questions (how the search occurs and why 1 second) can be answered in one answer - the search engine has prepared in advance for the user’s request.

To understand the principle of operation of Yandex, like other search engines, let’s draw an analogy with telephone directory. To find any phone number, you need to know the subscriber's last name, and any search in this case takes a maximum of a minute, because all pages of the directory are a continuous alphabetical index. But imagine if the search was carried out using a different option, where phone numbers would be ordered by the numbers themselves. After such searches, which will drag on for a longer time, the numbers will remain before the eyes of the searcher for a very long time. 🙂

Likewise, the search engine displays all the information from the Internet in a form convenient for it. And most importantly, all this data is placed in her directory in advance, before the visitor arrives with his requests. That is, when we ask Yandex a question, it already knows our answer. And gives it to us in a second. But this second includes a number of important processes, which we will now consider in detail.

Internet Indexing

Yandex ru collects all the information it can get its hands on on the Internet. Using special equipment, all content is reviewed, including images based on visual parameters. The search engine is engaged in such collection, and the process of collecting and preparing data is called indexing. The basis of such a machine is computer system, which is otherwise called a search robot. It regularly crawls indexed sites, checks them for new content, and also scans the Internet in search of deleted pages. If it discovers that some such page no longer exists or is closed from indexing, it removes it from the search.

How does a search robot find new sites? Firstly, thanks to links from other sites. Because if a link is placed on a new web resource from an already indexed site, then the next time you visit the second one, the robot will visit the first one. Secondly, in the Webmaster of the Yandex search engine there is a wonderful service, popularly called “addurilka” (from the phrase in English-addurl - add an address). In it you can enter the address of your new site, which will be visited by a search robot after a while. Thirdly, with the help special program Yandex.Bar tracks the visits of users who use it. Accordingly, if a person lands on a new web resource, a robot will soon appear there.

Are all pages included in the search? Millions of pages are indexed every day. Among them there are pages of varying quality, which can contain different information - from unique content to complete garbage. Moreover, as statistics say, there is much more garbage on the Internet. The search robot analyzes each document using special algorithms. He determines whether he has any useful information whether it can answer the user's request. If not, then such pages are not accepted as “cosmonauts,” but if so, then it is included in the search.

After a robot has visited a page and determined its usefulness, it appears in the search engine's storage. Here we analyze any document down to the very basics, as the auto center masters say - down to the cogs. The page is cleared of html markup, the clean text undergoes a full inventory - the location of each word is calculated. In this disassembled form, the page turns into a table with numbers and letters, which is otherwise called an index. Now, no matter what happens to the web resource that contains this page, its latest copy is always available in the search. Even if the site no longer exists, copies of its documents are stored on the Internet for some time.

Each index, together with data on document types, encoding, language, together with copies, constitute search database . It is updated periodically, so it is located on special servers with the help of which requests from search engine users are processed.

How often does the indexing process occur? First of all, it depends on the types of sites. The first type of web resource changes the content of its pages very often. That is, when a search robot comes to these pages each time, they contain different content each time. Next time you won’t be able to find anything using them, so such sites are not included in the index. The second type of site is a data warehouse, on the pages of which links to documents for downloading are periodically added. The content of such a site usually does not change, so the robot visits it extremely rarely. Other sites depend on the frequency of updating the material. This means the following: the faster new content appears on the site, the more often the search robot comes. And priority is given first to the most important web resources (a news site is an order of magnitude more important than any blog, for example).

Indexing allows you to perform the first function of a search engine - collecting information on new pages on the Internet. But Yandex also has a second function - searching for an answer to a user’s request in an already prepared search database.

Yandex is preparing a response

The process of processing the request and issuing relevant responses is handled by computer system "Metasearch" . For her work, she first collects all background information: from which region the request was made, what class it belongs to, are there any errors in the request, etc. After such processing, metasearch checks whether there are exactly the same queries with the same parameters in the database. If the answer is yes, then the system shows the user the previously saved results. If such a question does not exist in the database, the metasearch addresses the search database that contains the index data.

And this is where amazing things happen. Imagine that there is one super-powerful computer that stores the entire Internet processed by search robots. The user sets a query and a search begins in the memory cells for all documents involved in the query. The answer has been found and everyone is happy. But let's take another case, when there are a lot of requests containing the same words in their body. The system must go through the same memory cells each time, which can increase the time it takes to process data significantly. Accordingly, the time increases, which can lead to the loss of the user - he will turn to another search engine for help.

To avoid such delays, all copies in the site index are distributed across different computers. After transmitting the request, metasearch instructs such servers to search for their piece of text. After which, all data from these machines is returned to central computer, it combines all the results obtained and gives the user the top ten best answers. With this technology, two birds are killed at once: the search time is reduced several times (the answer is obtained in a split second) and, thanks to the increase in platforms, information is duplicated (data is not lost due to sudden breakdowns). The computers themselves with duplicate information make up a data center - this is a room with servers.

When a search engine user asks a query, 20 times out of 100, the goals in the question are ambiguous. For example, if he writes the word “Napoleon” in the search bar, then it is not yet known what answer he expects - a cake recipe or a biography of the great commander. Or the phrase “Brothers Grimm” - fairy tales, films, musical group. To narrow such a possible range of goals to specific answers, Yandex has a special technology S p e c t r. It takes into account user needs using search query statistics. Of all the questions asked in Yandex by visitors, Spectrum identifies various objects in them (names of people, titles of books, car models, etc.) These objects are distributed into certain categories. Currently there are more than 60 such categories. With their help, the search engine has in its database different meanings words in user queries. Interestingly, these categories are periodically checked (analysis occurs a couple of times a week), which allows Yandex to more accurately provide answers to the questions posed.

Based on Spectrum technology, Yandex organized dialog prompts. They appear below the search bar in which the user types his ambiguous query. This line reflects the categories to which the subject of the question may belong. Further search results depend on the user’s choice of this category.

From 15 to 30% of all users of the Yandex search engine want to receive only local information (data from the region in which they live). For example, about new films in cinemas in your city. Therefore, the answer to such a request should be different for each region. In this regard, Yandex uses its technology search based on regions . For example, these are the answers residents who are looking for a repertoire of films in their Oktyabr cinema may receive:

But this is the result that residents of the city of Stavropol will receive for the same request:

The user's region is determined primarily by its IP address. Sometimes this data is not accurate, because a number of providers can work in several regions at once, and therefore change the IP addresses of their users. In principle, if this happens to you, you can easily change your region in the settings in the search engine. It is listed in the upper right corner of the results page. You can change it.

Search engine Yandex ru - response results

When Metasearch has prepared an answer, the Yandex search engine should display it on the results page. It is a list of links to found documents with a little information on each. The task of the technology for issuing results is to provide the user with the most relevant answers in the most informative way. The template for one such link looks like this:

Let's look at this form of result in more detail. For search result title Yandex often uses the name of the page title (what optimizers write in the title tag). If it is not there, then the words from the title of the article or post appear here. If the title text is large, the search engine places in this field the fragment that is most relevant to the given query.

Very rarely, but it happens that the title does not match the content of the request. In this case, Yandex forms its search result title using the text in the article or post. It will definitely have query words.

For snippet the search engine uses all the text on the page. It selects all the fragments where the answer to the query is present, and then selects the most relevant one and inserts links to the document into the form field. Thanks to this approach, a competent optimizer can remake it after seeing a snippet, thereby improving the attractiveness of the link.

To better perceive the result of a user's request, headings are formatted as links in the text (highlighted in blue with underlining). To make the web resource attractive and recognizable, a favicon is added - a small corporate icon of the site. It appears to the left of the text on the first line before the heading. All words that were included in the request in the response are also highlighted in bold for ease of perception.

Recently, the Yandex search engine has been adding to the snippet various information, which will help the user find their answer even faster and more accurately. For example, if a user writes the name of an organization in his request, then Yandex will add its address in the snippet, contact numbers and a link to the location in geographical maps. If the search engine is familiar with the structure of the site, which contains a document with an answer for the user, it will definitely show it. Plus, Yandex can immediately add the most visited pages of such a web resource to the snippet so that, if desired, the visitor can immediately go to the section he needs, saving his time.

There are snippets that contain the price of a product for an online store, a hotel or restaurant rating in the form of stars, and other interesting information with various numbers about objects in search documents. The purpose of such information is to provide a complete list of data about those items or objects that are of interest to the user.

In general, with various examples, the page with answers will look like this:

Ranking and assessors

Yandex's task includes not only searching for all possible options answer, but also the selection of the best (relevant). After all, the user will not rummage through all the links that Yandex will provide him with as a search result. The process of organizing search results is called ranking . That is, it is the ranking that determines the quality of the proposed answers.

There are rules by which Yandex determines relevant pages:

  • Sites that degrade the search quality will be downgraded in positions on the results page. Usually these are web resources whose owners are trying to deceive the search engine. For example, these are sites with pages containing meaningless or invisible text. Of course, it is visible and understandable to a search robot, but not to a visitor reading this document. Or sites that, when clicking on a link in the search results area, immediately transfer the user to a completely different site.
  • Sites containing erotic content are not included in the results or are greatly reduced in ranking. This is due to the fact that such web resources often use aggressive promotion methods.
  • Sites infected with viruses are not lowered in search results and are not excluded from search results - in this case, the user is informed about the danger using a special icon. This is due to the fact that Yandex assumes that such web resources may contain important documents at the request of a search engine visitor.

For example, this is how Yandex will rank sites for the query “apple”:

In addition to ranking factors, Yandex uses special samples with queries and answers that search engine users consider the most suitable. No machine can make such samples on at the moment- this is the prerogative of man. In Yandex, such specialists are called assessors. Their task is to fully analyze all search documents and evaluate responses to specified queries. They select the best answers and create a special training set. In it, the search engine sees the relationship between relevant pages and their properties. Having such information, Yandex can select the optimal ranking formula for each request. The method for constructing such a formula is called Matrixnet. The advantage of this system is that it is resistant to overfitting, which allows you to take into account a large number of ranking factors without increasing the number of unnecessary ratings and patterns.

At the end of my post, I want to show you interesting statistics collected by the Yandex search engine in the process of its work.

1. Popularity of personal names in Russia and Russian cities (data taken from accounts of bloggers and social network users in March 2012).

2. Statistics from various types interests.

My post about how the Yandex search engine works is complete.

In 1863, the great writer Jules Verne created his next book, “Paris in the 20th Century.” In it, he described in detail the subway, the car, the electric chair, the computer and even the Internet. However, the publisher refused to print the book and it lay there for more than 120 years until it was found by the great-grandson of Jules Verne in 1989. The book was published in 1994.

A search engine or simply “search engine” is one that searches Internet pages in accordance with the user’s request. The most famous search engine in the world is Google, the most popular in Russia is Yandex, and one of the oldest search engines is Yahoo. In the search engine architecture we can distinguish search engine– the core of the system, represented by a set software modules; database or index, which stores information about all Internet resources known to the search engine; and a set of sites that are entry points users into the system (www.google.com, www.yandex.ru, ru.yahoo.com, etc.). All this corresponds to the classical three-level architecture of information systems: there is user interface, business logic, which in this case is represented by the implementation of search algorithms and a database.

Specifics of Internet search

At first glance, searching on the Internet is not much different from ordinary information search, for example, from processing to a database or from the task of searching for a file on . The developers of the first Internet search engines thought so too, but over time they realized that they were mistaken...

The first difference between Internet search and regular search is that the search algorithm for the same database assumes that its structure is known in advance to the search engine and the author of the query. On the Internet, for obvious reasons, this is not the case. Internet pages do not form a directory structure, but a network, which also affects search algorithms, and the format of data posted on Internet resources is not controlled by anyone.

The second difference, as one of the consequences of the first, is that the request is presented not as a set of parameter values ​​(search criteria), but as text written by a person in his natural language. Thus, before you start searching, you still need to understand what exactly the author of the request wants. Let me note that it is not for another person to understand, but for a computer.

The third difference is less obvious, but no less fundamental: in a catalog or database, all elements have equal rights. There is competition on the Internet, and, consequently, a division into more “reliable information providers” and sources similar in status to “information garbage.” This is how people classify resources, and this also applies to search engines.

And in conclusion, it should be added that the search area is billions of pages, several kilobytes or more each. About ten million pages are added daily and the same number are updated. All this is represented by various digital formats. Unfortunately, even modern technologies and the resources available to the leaders of the Internet search services market do not allow them to process all this diversity “on the fly” and in full.

What does a search engine consist of?

First of all, it is important to realize one more and, probably, the most significant difference between the work of a search engine on the Internet and the work of any other information system, which searches in various catalogs and databases. Internet search engine the machine does not search for information among what is on the Internet at the time the request is received, but tries to generate a response based on its own information storage - a database called an index, where it stores a dossier on everything known to it and periodically updates it. In other words, the search engine does not work with the original, but with a projection of the range of acceptable search values. All latest changes on the Internet can be reflected in search results only after the corresponding pages have been indexed- added to the search engine index. So, a search system, to a first approximation, consists of a search engine, a database or index (index) and entry points into the system.

Now briefly about what a search engine consists of:

  • Spider or spider. An application that downloads pages of Internet resources. The spider does not “crawl” anywhere - it only requests the contents of pages in the same way as a regular Internet browser does, sending it to the server HTTP request and receiving a response from him. Once the page content is downloaded, it is sent to the indexer and crawler, which are discussed below.

  • Indexer. The indexer performs an initial analysis of the content of the downloaded page, selects the main parts (page title, description, links, headings, etc.) and arranges it all into sections of the search database - places it in the search engine index. This process is called indexing of Internet resources, hence the name of the subsystem itself. Based on the results of the initial analysis, the indexer may also decide that the page is not “worthy” of being in the index at all. The reasons for this decision may be different: the page has no name, it is an exact copy another page already in the index or contains links to resources prohibited by law.

  • Crawler. This “animal” is designed to “crawl” along the links available on the page downloaded by the spider. The crawler analyzes the paths leading from current page to other sections of the site, or to pages of external Internet resources and determines the further order in which the spider crawls the threads of the World Wide Web. It is the crawler that finds pages that are new to the search engine and transmits them to the spider. The work of the crawler is based on search algorithms for breadth and depth first graphs.

  • Subsystem for processing and issuing results ( Search Engine and Results Engine). The most important part of any search engine. The developers keep the operating algorithms of this subsystem of the company in strict secrecy, since they are a trade secret. It is this part of the search engine that is responsible for the adequacy of the search engine’s response to the user’s request. There are two main components here:
    • Ranking subsystem. Ranging– these are pages of Internet sites in accordance with their relevance to a specific request. Page relevance– this, in turn, is the degree to which the content of the page corresponds to the meaning of the request, and the search engine determines this value independently, based on a huge number of parameters. Ranking is the most mysterious and controversial part of the “artificial intelligence” of a search engine. The ranking of a page, in addition to its structure and content (content), is also influenced by: the number and quality of links leading to this page from other sites; age of the domain of the site itself; the nature of the behavior of users viewing the page and many other factors.

    • Subsystem for issuing results. The tasks of this subsystem include interpreting the user request and translating it into language structured queries to the index and generation of search results pages. In addition to parsing the query text itself, the search engine may also take into account:
      • Request context, formed based on the meaning of requests previously made by the user. For example, if a user often visits sites on automotive topics, then when asked for the word “Volga” or “Oka”, he probably wants to receive information about cars of these brands, and not about where the Russians of the same name begin and where they flow rivers. It's called personalized search, when the output for the same request for different users is significantly different.

      • User preferences, which it (the search engine) can “guess” about, analyzing the links the user selects on search results pages. This is another way to adjust the context of a request: the user, through his actions, seems to be telling the machine what exactly he wanted to find. As a rule, search engines try to add pages to the search results that are relevant to the query, but related to quite different areas of life. Let's say a user is interested in movies and therefore often selects links to pages with movie announcements, even if these pages are not entirely relevant to the original request. When generating a response to his next request, the system may give preference to pages with descriptions of films whose titles contain words from the text of the request.

      • Region, which is very important when processing commercial requests related to the purchase of goods and services from local suppliers. If you are interested in sales and discounts and are in Moscow, then you are most likely not at all interested in what promotions on this topic are being held in St. Petersburg, unless you explicitly indicate this in the text of the request. First of all, information about sales in Moscow should appear in the search results. Thus, modern search engines divide queries into geo-dependent And geo-independent. Most likely, if the search engine decides that your query is geo-dependent, then it automatically adds a region indicator to it, which it tries to determine from information about your Internet provider.

      • Time. Search engines sometimes have to analyze when the events described on the page took place. After all, information is constantly becoming outdated, and the user primarily needs links to the latest news, current forecasts and announcements of events that have not yet ended or are scheduled to occur in the future. Understanding that the relevance of a page depends on time, and comparing it with the moment the request was executed, also requires a fair amount of intelligence from the search engine.

      Next, the search engine looks for the closest in meaning key query in the index and generates results by sorting links in descending order of their relevance. Each key query in the index has a separate ranking for pages relevant to it. The system does not create a new key query for every combination of letters and numbers, but does this based on an analysis of the frequency of certain user queries. The search engine may also mix up rankings from different keywords in search results if it thinks that's what the user is looking for.

General principles of search engine operation

You need to understand that Internet search services are a very, very profitable business. You don’t need to go into details about how companies like Google and Yandex live, since the main part of their profit is income from contextual advertising. And since searching on the Internet is an extremely profitable business, then the competition among such companies is very serious. What determines competitiveness in the Internet search market? The answer is the quality of search engine results. It is logical that the higher it is, the more new users the system gets, and the more valuable it is placed on the pages of this same search results. contextual advertising. Search engine developers spend a lot of effort to “clean up” their search results from various types of information garbage, popularly called spam. How this is done will be described in more detail in a separate article, but here I will provide general principles search engine behavior, formulated in the form of conclusions based on all of the above.

  1. The search engine, represented by its spiders and crawlers, constantly scans the Internet for new pages and updates to existing ones, since irrelevant information is valued lower.

  2. The search engine periodically updates the ranking of resources based on their relevance to key queries, as new pages constantly appear in the index. This process is called updating the search results.

  3. Due to the huge volumes of information posted in world wide web and the limited resources of the search engine itself, the search engine always tries to load only what is (in its opinion) necessary. Its arsenal includes all kinds of filters that cut off much that is unnecessary already at the indexing stage or throw spam out of the index based on the results of updating the search results.

  4. When analyzing a request, modern search engines try to take into account not only the text of the request itself, but also its environment: the context and preferences of the user, which were mentioned earlier, as well as the time of the request, region, and much more.

  5. The relevance of a particular page is influenced not only by its internal parameters (structure, content), but also by external parameters, such as links to the page from other sites and user behavior when viewing it.

The work of search engines is constantly being improved. Perfect job search engine (for humans) is possible only if all decisions regarding indexing and ranking are made by a commission consisting of a large number of specialists from all fields and areas of human activity. Since this is unrealistic, such a commission is replaced by expert systems, heuristic search algorithms and other elements of artificial intelligence. Probably, the work of all these subsystems could also give more adequate results if it were possible to process absolutely all the data available in open access on the Internet, but this is almost impossible. Imperfect artificial intelligence and limited resources are the two main reasons that search results do not always please users, but all this can be cured with time. Today, in my opinion, the work of the most famous and large search engines fully meets the needs and expectations of their users.

Hello, dear readers!

There are currently quite a lot of search engines in the global Internet space. Each of them has its own algorithms for indexing and ranking sites, but in general the principle of search engines is quite similar.

Knowledge of how a search engine works in the face of rapidly growing competition is a significant advantage when promoting not only commercial, but also informational sites and blogs. This knowledge helps you build an effective website optimization strategy and, with less effort, get to the TOP of search results for promoted query groups.

How search engines work

The purpose of the optimizer’s work is to “adjust” promoted pages to search algorithms and, thereby, help these pages achieve high positions for certain queries. But before starting work on optimizing a website or blog, it is necessary to at least superficially understand the peculiarities of the work of search engines in order to understand how they can react to the actions taken by the optimizer.

Of course, the detailed details of the formation of search results are information that search engines do not disclose. However, for proper search efforts, an understanding of the main principles by which search engines operate is sufficient.

Information search methods

The two main methods used by search engines today differ in their approach to information retrieval.

  1. Direct search algorithm, which involves matching each of the documents stored in the search engine database with a key phrase (user query), is a fairly reliable method that allows you to find all necessary information. The disadvantage of this method is that when searching large data sets, the time required to find the answer is quite long.
  2. Reverse Index Algorithm, When key phrase a list of documents in which it is present is compared; it is convenient when interacting with databases containing tens and hundreds of millions of pages. With this approach, the search is not carried out in all documents, but only in special files, including lists of words contained on website pages. Each word in such a list is accompanied by an indication of the coordinates of the positions where it occurs and other parameters. It is this method that is used today in the work of such well-known search engines as Yandex and Google.

It should be noted here that when a user accesses the search bar of the browser, the search is performed not directly on the Internet, but in pre-collected, saved and currently relevant databases containing blocks of information processed by search engines (website pages). Fast generation of search results is possible thanks to working with reverse indexes.

The text content of pages (direct indexes) is also saved by search engines and used to automatically generate snippets from the text fragments that are most suitable for the request.

Mathematical ranking model

In order to speed up the search and simplify the process of generating results that best meet the user's request, a certain mathematical model is used. The task of this mathematical model is to find the necessary pages in the current database of reverse indexes, assess their degree of compliance with the request, and distribute them in descending order of relevance.

Simply finding the desired phrase on the page is not enough. When determined by search engines, the weight of the document is calculated relative to the user request. For each request, this parameter is calculated based on the following data: frequency of use on the analyzed page and a coefficient reflecting how rarely the same word appears in other documents in the search engine database. The product of these two quantities corresponds to the weight of the document.

Of course, the presented algorithm is very simplified, since search engines have a number of other additional coefficients at their disposal that are used in calculations, but this does not change the meaning. The more often a single word from a user’s query appears in a document, the higher the weight of the latter. In this case, the text content of the page is considered spam if certain limits are exceeded, which are different for each request.

Basic functions of a search engine

All existing systems search engines are designed to perform several important functions: searching for information, indexing it, qualitative assessment, correct ranking and generating search results. The primary task of any search engine is to provide the user with the information he is looking for and the most accurate answer to a specific request.

Since most users have no idea how Internet search engines work and the ability to teach users how to search “correctly” is very limited (for example, with search tips), developers are forced to improve the search itself. The latter implies the creation of algorithms and principles of operation of search engines that allow one to find the required information, regardless of how “correctly” formulated search query.

Scanning

This is tracking changes in already indexed documents and searching for new pages that can be presented in the search results for user requests. Search engines scan resources on the Internet using specialized programs called spiders or search robots.

Scanning Internet resources and collecting data is carried out automatically by search bots. After the first visit to a site and its inclusion in the search database, robots begin to periodically visit this site to monitor and record changes that have occurred in the content.

Since the number of developing resources on the Internet is large, and new sites appear every day, the described process does not stop for a minute. This principle of operation of Internet search engines allows them to always have up-to-date information about sites available on the Internet and their content.

The main task of a search robot is to search for new data and transfer it to the search engine for further processing.

Indexing

The search engine is able to find data only on sites represented in its database - in other words, indexed. At this step, the search engine must determine whether the information found should be entered into the database and, if so, in which section. This process is also performed automatically.

It is believed that Google indexes almost all information available on the Internet, while Yandex approaches content indexing more selectively and not so quickly. Both search giants of the Runet work for the benefit of the user, but the general principles of operation of the Google and Yandex search engines are somewhat different, since they are based on unique software solutions that make up each system.

A common point for search engines is that the process of indexing all new resources takes longer than indexing new content on sites known to the system. Information appearing on sites that are highly trusted by search engines ends up in the index almost instantly.

Ranging

Ranking is an assessment by search engine algorithms of the significance of indexed data and arranging them in accordance with factors specific to a given search engine. The information received is processed to generate search results for the entire range of user queries. What information will be presented above and below in search results is entirely determined by how the selected search engine and its algorithms work.

Sites in the search engine database are divided into topics and query groups. For each group of requests, a preliminary output is generated, which is subject to further adjustment. The positions of most sites change after each SERP update - a ranking update that occurs daily in Google, and every few days in Yandex search.

A person as an assistant in the struggle for quality of delivery

The reality is that even the most advanced search engines, such as Yandex and Google, currently still require human assistance to generate results that meet accepted quality standards. Where the search algorithm does not work well enough, its results are adjusted manually - by assessing the page content according to multiple criteria.

A large army of specially trained people from different countries - search engine moderators (assessors) - have to perform a huge amount of work every day to check the compliance of website pages with user requests, filtering results from spam and prohibited content (texts, images, videos). The work of assessors allows us to make search results cleaner and contributes to the further development of self-learning search algorithms.

Conclusion

With the development of the Internet and the gradual change in standards and forms of content presentation, the approach to search also changes, the processes of indexing and ranking information, the algorithms used are improved, and new ranking factors appear. All this allows search engines to generate the highest quality results that are adequate to user requests, but at the same time complicates the life of webmasters and specialists involved in website promotion.

In the comments below the article, I invite you to speak out about which of the main RuNet search engines - Yandex or Google, in your opinion, works better, providing the user with a better search, and why.

They are one of the main and most important Internet services.

With the help of search engines, billions of Internet users find the information they need.

What is a search engine?

A search engine is a software and hardware complex that uses special algorithms to process a huge amount of information about a wide variety of sites, their content, down to each page.

A search engine, from the point of view of ordinary visitors, is such a smart site that contains a lot of information and provides answers to any user queries.

IN different countries Internet users use various search engines. In the English-speaking segment of the Internet, the most popular search engine is Google.

Search engines in RuNet

In Russia, more than half of users prefer the Yandex search engine, and Google accounts for about 35% of queries. Other users use Rambler, Mail.ru, Nigma and other services.

In Ukraine, about 60% of users use Google, Yandex accounts for slightly more than 25% of processed requests.

Therefore, when promoting sites on the Runet, specialists try to promote the site, focusing on the search engines Yandex and Google.

Search engine tasks

In order to answer visitors' questions as accurately as possible, search engines must perform the following tasks:

  1. Quickly and efficiently collect information about various pages of different sites.
  2. Process information about these pages and determine which query or queries they correspond to.
  3. Form and issue search results in response to user requests.

Components of search engines

Search engines are a complex software complex that consists of the following main blocks:

  1. Data collection.
  2. Indexing.
  3. Calculation.
  4. Ranging.

This division is conditional, since the work of different search engines is somewhat different from each other.

1. Data collection

At this stage, the task is to find new documents, make a plan for visiting and scanning them.

Webmasters need to let search engines know about the appearance of new materials by placing the page address in the add-on page or broadcasting the page announcement on social networks.

Personally I use the last way and I think that this is quite enough.

Comment. I’ll digress a little and talk about the effectiveness of placing announcements in social networks on the speed of indexing new website pages.

I use the text.ru service to control and record the uniqueness of text on the pages of my website.

It qualitatively checks uniqueness, records it and makes it possible to place a uniqueness banner on the pages of your website.

But sometimes there is a long queue for processing on this service. I have had several cases where I did not wait for the uniqueness check, posted an article on the site and circulated it on social networks.

If the uniqueness check was delayed for about an hour or more, then the uniqueness percentage was always 0%. This means that in less than an hour after posting, the page was already indexed and entered into the search engine database.

2. Indexing

Search engines, having collected data about new web pages, place them in their database. In this case, an index is formed, that is, a key for quick access to data about this page, if necessary.

3. Calculation

After entering the database, the pages of our sites go through the stage of calculating various parameters and indicators.

No one except the developers of search engine algorithms themselves can say exactly how many of these indicators are and how they are calculated.

4. Ranking

Then, based on the calculated parameters and indicators, the relevance of the page to certain queries is determined and the page is ranked.

This will be important for the quick and high-quality generation of search results pages for these queries.

Search engines generate answers to user queries and generate results for them in the form of a search results page.

It should be noted that algorithms for processing page data, generating indicators and ranking methods are constantly being improved. The priorities by which ranking occurs change.
Search engines strive to answer user requests as accurately as possible, trying to take into account the nature of the request, the interests of a particular user, his place of residence, age, gender, habits, and inclinations.

The most popular web service of our time is the search engine. Everything is understandable here, because the days when representatives of the first Internet users could observe new products on the Internet are long gone.

So much information appears and accumulates that it has become very difficult for a person to find exactly what he needs. Imagine what it would be like to search on the Internet if the average user had to look for information from God knows where. I don’t understand exactly where, because manual search You won't find much information.

Search engine, what is it?

It’s good if the user already knows sites that may have the necessary information, but what to do otherwise? In order to make life easier for a person in search necessary information on the Internet and search engines or simply search engines were invented. The search engine performs one very important function, without which the Internet would not be the same as we are used to seeing it - this is searching for information on the Internet.

Search engine- this is a special web site or in other words a site that provides users, upon their requests, with hyperlinks to pages of sites that respond to a given search query.

To be a little more precise, it is a search for information on the Internet, carried out thanks to a software and hardware functional set and a web interface for interacting with users.

For human interaction with the search engine, a web interface was created, that is, a visible and understandable shell. This approach of search engine developers makes searching easier for many people. As a rule, it is on the Internet that searches are carried out using search engines, but there are also search systems for FTP servers, certain types of goods on the World Wide Web, or news information or other search directions.

The search can be carried out not only by the text content of sites, but also by other types of information that a person can search for: images, videos, sound files, etc.

How does a search engine search?

Searching the Internet itself, just like browsing websites, is possible using an Internet browser. Only after the user has specified his query in the search bar, the search itself is carried out directly.

Any search system contains a software part on which the entire search mechanism is based; it is called a search engine - this is a software package that provides the ability to search for information. After contacting a search engine, a person generates a search query and enters it into the search bar, the search engine generates a page with a list of search results, the most relevant ones, in the opinion of the search engine, are located higher.

Search relevance - searching for the most relevant materials to the user's request and placing hyperlinks on them on the search results page with more accurate results above others. The distribution of results itself is called site ranking.

So how does a search engine prepare its materials for publication and how does the search engine itself search for information? The collection of information on the network is facilitated by a unique robot or bot for each search engine, which also has a number of other synonyms such as crawler or spider, and the work of the search system itself can be divided into three stages:

The first stage of a search engine’s operation includes scanning sites in global network and collecting on your own own servers copies of web pages. This forms huge amount information that has not yet been processed and is not suitable for search results.

The second stage of the search engine’s work comes down to putting in order the information received earlier, in the first stage, from sites. The sorting is carried out in such a way that in the least amount of time will be conducive to the very high-quality search that users actually expect from a search engine. The stage is called indexing, which means that the pages are already prepared for issuance, and the current database will be considered an index.

It is precisely the third stage that determines the search results, after receiving a request from its client, based on the keywords or near keywords specified in the request. This facilitates the selection of the information most relevant to the request and its subsequent delivery. Since there is a lot of information, the search engine performs ranking in accordance with its algorithms.
The best search engine is considered to be the one that can provide the material that most correctly responds to the user’s request. But here, too, there may be results that were influenced by people interested in promoting their site; such sites, although not always, often appear in search results, but not for long.

Although world leaders have already been identified in many regions, search engines continue to develop their high-quality search. The better the search they can provide, the more people will use it.

How to use the search engine?

What is a search engine and how it works is already clear, but how to use it correctly? Most sites always have a search bar, and next to it there is a Find or Search button. IN search bar a query is entered, after which you need to press the search button or, as is more often the case, press the Enter key on the keyboard and in a matter of seconds you receive the query result in the form of a list.

But it’s not always possible to get the correct answer to a search query the first time. To ensure that the search for what you want does not become painful, you must correctly compose your search query and follow the recommendations described below.

We compose the search query correctly

The following will provide tips for using the search engine. Following some tricks and rules when searching for information in a search engine will make it possible to get the desired result much faster. Follow these guidelines:

  1. Correct spelling of words ensures maximum quantity matches with the searched information object(Although modern search engines have already learned to correct spelling errors, this advice should not be neglected).
  2. By using synonyms in your query, you can cover a wider search range.
  3. Sometimes changing a word in the query text can bring better results; reformat the query.
  4. Bring specificity to your query, use exact occurrences of phrases that should define the main essence of the search.
  5. Experiment with keywords. Usage keywords and phrases can help determine the main point, and the search engine will return a more relevant result.

So what a search engine is is nothing more than an opportunity to find information of interest and usually use it completely free of charge, learn something, understand something, or draw the right conclusion for yourself. Many people can no longer imagine their life without voice search, in which you don’t have to type text, you just need to say your request, and the information input device here is a microphone. All this indicates the constant development of search technologies on the Internet and the need for them.

Share