How is the information volume in informatics. Determining the information content of a message

The purpose of the lesson:

  1. Have an idea about the alphabetical approach to determining the amount of information;
  2. Know the formula for determining the number of information messages, the amount of information in messages;
  3. To be able to solve problems to determine the number of information messages and the amount of information that the received message carries.

During the classes

1. Knowledge update:

Guys, let's watch what we see outside the window. What can you say about nature? (Winter came.)
- But why did you decide that winter had come? (It's cold, it's snowing.)
- But nowhere is it written that these are signs of winter. (But we know what it all means: winter has come.)

Therefore, it turns out that the knowledge that we extract from the surrounding reality is information. (slide 1)

Warm up.

Fill in the table and use the arrows to show the matches.

Is it possible to measure the amount of information and how to do it? (Yes)

It turns out that information can also be measured and quantified.

There are two approaches to measuring information. We will meet one of them today. (Look at the application slide 2)

2. Learning new material.

How can you find the amount of information?

Consider an example.

We have a small text written in Russian. It consists of letters of the Russian alphabet, numbers, punctuation marks. For simplicity, we will assume that the characters in the text are present with the same probability.

The set of symbols used in the text is called alphabetically.

In computer science, the alphabet is understood not only as letters, but also as numbers, punctuation marks, and other special characters.

The alphabet has a size (total number of characters), which is called the power of the alphabet. With the alphabetical approach, it is considered that each character of the text has a certain “information weight”. With an increase in the power of the alphabet, the informational weight of the symbols of this alphabet increases.

Denote the cardinality of the alphabet by N.

Let's find the relationship between the information weight of the symbol (i) and the power of the alphabet (N). The smallest alphabet contains 2 characters, which are denoted by "0" and "1". The information weight of a binary alphabet character is taken as a unit of information and is called 1 bit. (See appendix slide 3)

N 2 4 8 16 32 64 128 256
i 1bit 2bit 3bit 4bit 5bit 6bit 7bit 8bit

The computer also uses its own alphabet, which can be called a computer alphabet. The number of characters that it includes is 256 characters. This is the power of the computer alphabet.

We also found out that 256 different characters can be encoded using 8 bits.

8 bits is such a characteristic value that it has been given its own name - byte.

1byte = 8 bits

Using this fact: one can quickly calculate the amount of information contained in a computer text, i.e. in a text typed using a computer, given that most articles, books, publications, etc. written using text editors, then in this way you can find the information volume of any message created in this way.

Let's see the rule for measuring information from the point of view of the alphabetical approach on the slide. (See appendix slide 4)

Example:

Find the information volume of a computer text page.

Decision:

Let's use the rule.

1. Find the power: N=256
2. Find the information volume of one character: N= 2i i = 8 bits = 1 byte.
3. Find the number of characters on the page. Approximately.

(Find the number of characters in a string and multiply by the number of strings)

Explanation:

Have the children choose a random string and count the number of characters in it, including all punctuation and spaces.

40 characters * 50 lines = 2000 characters.

4. Find the information volume of the entire page: 2000 * 1 = 2000 bytes

Agree that a byte is a small unit of information. The following units are used to measure large amounts of information (See appendix slide5)

3. Consolidation of the studied material.

On the desk:

Fill in the gaps with numbers and check for correctness.

1 KB = ___ bytes = ______ bits,
2 KB = _____ bytes = ______ bits,
24576 bits =_____bytes =_____Kbytes,
512 KB = ___ bytes = _____ bits.

The students are given the following tasks:

1) The message is written using an alphabet containing 8 characters. How much information does one letter of this alphabet carry?

Decision: N=8 , then i= 3 bits

2) A message written with letters from the 128-character alphabet contains 30 characters. How much information does it carry?

1.N=128 , K=30
2. N= 2 i i= 7 bits (volume of one character)
3.I=30*7=210bit (volume of the whole message)

4. Creative work.

Type on the computer text, the information volume of which is 240 bytes.

5. The results of the lesson.

What new did we learn at the lesson today?
- How is the amount of information determined from an alphabetical point of view?
- How to find the power of the alphabet?
What is 1 byte equal to?

6. Homework (See appendix slide 6).

Learn the rule for measuring information in terms of the alphabetical approach.

Learn the units of measure for information.

To solve a problem:

1) The capacity of some alphabet is 64 characters. What will be the amount of information in a text consisting of 100 characters.
2) The information volume of the message is 4096 bits. It contains 1024 characters. What is the power of the alphabet with which this message is composed?

Amount of Information

The amount of information as a measure of reducing the uncertainty of knowledge.
(Meaningful approach to determining the amount of information)

The process of cognition of the surrounding world leads to the accumulation of information in the form of knowledge (facts, scientific theories, etc.). The acquisition of new information leads to an increase in knowledge or, as is sometimes said, to a decrease in the uncertainty of knowledge. If a certain message leads to a decrease in the uncertainty of our knowledge, then we can say that such a message contains information.

For example, after passing a test or completing a test, you are tormented by uncertainty, you do not know what grade you received. Finally, the teacher announces the results, and you receive one of two information messages: "pass" or "fail", and after the test, one of four information messages: "2", "3", "4" or "5".

Announcement about the assessment for credit leads to a decrease in the uncertainty of your knowledge by half, since one of two possible information messages is received. An informational message about the assessment for a test leads to a fourfold decrease in the uncertainty of your knowledge, since one of four possible informational messages is received.

It is clear that the more uncertain the initial situation (the large quantity information messages is possible), the more new information we will receive when receiving an information message (the more times the uncertainty of knowledge will decrease).

Amount of Information can be considered as a measure to reduce the uncertainty of knowledge when receiving information messages.

The approach discussed above to information as a measure of reducing the uncertainty of knowledge makes it possible to measure information quantitatively. There is a formula that relates the number of possible information messages N and the amount of information I that the received message carries:

N = 2 i (1.1)

Bit. To quantify any quantity, you must first determine the unit of measure. So, for measuring length, a meter is chosen as a unit, for measuring mass - a kilogram, etc. Similarly, to determine the amount of information, it is necessary to enter a unit of measurement.

Per unit of information the amount of information that is contained in the information message is received, which reduces the uncertainty of knowledge by half. This unit is called bit.

If we return to the receipt of an information message about the results of the offset considered above, then here the uncertainty is just halved and, therefore, the amount of information that the message carries is equal to 1 bit.

Derived units for measuring the amount of information. The minimum unit for measuring the amount of information is a bit, and the next largest unit is a byte, and:

1 byte = 8 bits = 2 3 bits.

In computer science, the system of education of multiple units of measurement is somewhat different from those accepted in most sciences. Traditional metric units, e.g. International system SI units, a factor of 10 n is used as multipliers of multiple units, where n \u003d 3, 6, 9, etc., which corresponds to the decimal prefixes "Kilo" (10 3), "Mega" (10 6), "Giga" (10 9) etc.

In a computer, information is encoded using a binary sign system, and therefore, in multiple units of measuring the amount of information, the coefficient 2 n is used

So, units of measurement of the amount of information that are multiples of a byte are entered as follows:

1 kilobyte (KB) = 2 10 bytes = 1024 bytes;

1 megabyte (MB) = 2 10 KB = 1024 KB;

1 gigabyte (GB) = 2 10 MB = 1024 MB.

test questions

    1. Give examples of information messages that lead to a decrease in the uncertainty of knowledge.
    2. Give examples of information messages that carry 1 bit of information.

Determining the amount of information

Determining the number of information messages. According to formula (1.1), one can easily determine the number of possible information messages if the amount of information is known. For example, in an exam, you take an exam ticket, and the teacher reports that the visual information message about his number carries 5 bits of information. If you want to determine the number of examination tickets, then it is enough to determine the number of possible information messages about their numbers using the formula (1.1):

Thus, the number of exam tickets is 32.

Determining the amount of information. On the contrary, if the possible number of information messages N is known, then to determine the amount of information that the message carries, it is necessary to solve the equation for I.

Imagine that you control the movement of the robot and you can set the direction of its movement using information messages: "north", "northeast", "east", "southeast", "south", "southwest", " west" and "northwest" (Fig. 1.11). How much information will the robot receive after each message?

There are 8 possible information messages in total, so formula (1.1) takes the form of an equation for I:

We decompose the number 8 on the left side of the equation into factors and represent it in a power form:

8 = 2 × 2 × 2 = 2 3 .

Our equation:

The equality of the left and right parts of the equation is true if the exponents of the number 2 are equal. Thus, I = 3 bits, i.e. the amount of information that each information message carries to the robot is 3 bits.

Alphabetical approach to determining the amount of information

With an alphabetical approach to determining the amount of information, one abstracts from the content of information and considers an information message as a sequence of signs of a certain sign system.

Information capacity of the sign. Imagine that it is necessary to transmit an informational message over an information transmission channel from the sender to the recipient. Let the message be encoded using a sign system, the alphabet of which consists of N characters (1, ..., N). In the simplest case, when the length of the message code is one character, the sender can send one of N possible messages "1", "2", ..., "N", which will carry the amount of information I (Fig. 1.5).

Rice. 1.5. Transfer of information

Formula (1.1) links the number of possible information messages N and the amount of information I that the received message carries. Then in the situation under consideration, N is the number of characters in the alphabet of the sign system, and I is the amount of information that each character carries:

Using this formula, you can, for example, determine the amount of information that a sign carries in a binary sign system:

N = 2 => 2 = 2 I => 2 1 = 2 I => I=1 bit.

Thus, in a binary sign system, a sign carries 1 bit of information. It is interesting that the very unit of measurement of the amount of information "bit" (bit) got its name from the English phrase "Binary digiT" - "binary digit".

The information capacity of a sign of a binary sign system is 1 bit

The more characters the alphabet of the sign system contains, the more information one character carries. As an example, let's determine the amount of information that a letter of the Russian alphabet carries. The Russian alphabet includes 33 letters, but in practice only 32 letters are often used to send messages (the letter "ё" is excluded).

Using formula (1.1), we determine the amount of information that a letter of the Russian alphabet carries:

N = 32 => 32 = 2 I => 2 5 = 2 I => I=5 bits.

Thus, a letter of the Russian alphabet carries 5 bits of information (with an alphabetical approach to measuring the amount of information).

The amount of information that a sign carries depends on the probability of receiving it. If the recipient knows in advance exactly which character will arrive, then the received amount of information will be equal to 0. Conversely, the less likely the receipt of the character, the greater its information capacity.

In Russian written speech, the frequency of using letters in the text is different, so on average there are 200 letters "a" per 1000 characters of a meaningful text and a hundred times fewer letters "f" (only 2). Thus, from the point of view of information theory, the information capacity of the signs of the Russian alphabet is different (the letter "a" has the smallest, and the letter "f" has the largest).

The amount of information in the message. The message consists of a sequence of characters, each of which carries a certain amount of information.

If the characters carry the same amount of information, then the amount of information I c in the message can be calculated by multiplying the amount of information I c that one character carries by the code length (the number of characters in the message) K:

I c \u003d I s × K

So, each digit of a binary computer code carries information in 1 bit. Therefore, two digits carry information in 2 bits, three digits in 3 bits, etc. The amount of information in bits is equal to the number of digits in a binary computer code (Table 1.1).

Table 1.1. The amount of information that a binary computer code carries

Topic: "Measuring information"

Formulas

Two formulas are required to determine the information content of a message:

1. \(N= 2^i \)

N is the cardinality of the alphabet

2. \(I = k * i \) ​

I - information volume of the message

k - number of characters in the message

i - information volume of one character in the alphabet

Formula for finding k:

Formula for finding i:

Tasks

Task number 1. A message written with letters from a 128-character alphabet contains 30 characters. Find the information volume of the entire message?

Decision.

\(I = ? \) ​

\(i = ? \) ​

\(N= 2^i \) = \(128= 2^7 \)

\(i = 7 \)​ bits. What is the power of two, such is the weight of one character in the alphabet. Next, we determine the information volume of the message by the formula:

\(I = k * i \) ​ = 30 * 7 = 210 bits

Answer: 210 bits

Task number 2. An information message of 4 KB contains 4096 characters. How many characters are in the alphabet with which this message was written?

Decision. We write down what is given by the condition of the problem and what needs to be found:

\(I = 4 \) ​ KB

\(N = ? \) ​

\(i = ? \) ​

It is very important to convert all numbers to powers of two:

1 KB = \(2^(13) \) bit

\(I = 4 \) ​ KB = \(2^2 \) * \(2^(13) \) = \(2^(15) \) bit

k = 4096 = \(2^(12) \)

First, find the weight of one character using the formula:

\(i = \frac(\mathrm I)(\mathrm k) \)​ = \(2^(15) \) : \(2^(12) \) = \(2^3 \) = 8 bits

\(N= 2^i \) \(2^8 =256\)

Answer: 256 characters in the alphabet.

Task number 3. How many characters does a message written using the 16-character alphabet contain if its size is 1/16 MB?

Decision. We write down what is given by the condition of the problem and what needs to be found:

​ Mb

\(k = ?\) ​

\(i = ? \) ​

Imagine \(I = \frac(\mathrm 1)(\mathrm 16) \)​ Mb to a power of two:

1 MB = \(2^(23) \) bits

\(I = \frac(\mathrm 1)(\mathrm 16) \)​ MB = \(2^(23) \) : ​\(2^4 \) = \(2^(19) \) bits.

First, find the weight of one character using the formula:

\(N= 2^i \) = \(2^4 = 16 \)

\(i = 4 \)​ bit = \(2^2 \)

Now let's find the number of characters in the message k:

\(k = \frac(\mathrm I)(\mathrm i) \)​ = \(2^{19} \) ​ : \(2^2 \) = \(2^{17} \) = 131072

Answer: 131072 characters in the message.

Information volume of text and units of measurement of information


A modern computer can process numerical, textual, graphic, sound and video information. All these types of information in a computer are presented in binary code, that is, only two symbols 0 and 1 are used. This is due to the fact that it is convenient to represent information in the form of a sequence of electrical impulses: there is no impulse (0), there is an impulse (1).

Such coding is usually called binary, and the logical sequences of zeros and ones themselves are called machine language.

How long does a binary code need to be in order to be able to encode your computer keyboard characters?

In this way, the informational weight of one character of a sufficient alphabet is 1 byte.

For measuring large information volumes larger units of information are used:

Units for measuring the amount of information:

1 byte = 8 bits

1 kilobyte = 1 KB = 1024 bytes

1 megabyte = 1 MB = 1024 KB

1 gigabyte = 1 GB = 1024 GB

Information volume of the text

1. Number of characters in the book:

60 * 40 * 150 = 360,000 characters.

2. Because 1 character weighs 1 byte, the information volume of the book is

360,000 bytes.

3. Convert bytes to larger units:

360,000 / 1024 = 351.56 KB

351.56 / 1024 = 0.34 MB

Answer: The information volume of the text is 0.34 Mb.

A task:

The information volume of the text prepared with the help of a computer is 3.5 Kb. How many characters does this text contain?

1. Let's convert the volume from MB to bytes:

3.5 MB * 1024 = 3584 KB

3584 KB * 1024 = 3,670,016 bytes

2. Because 1 character weighs 1 byte, the number of characters in the text is

The alphabetical approach is used to measure amount of information in a text represented as a sequence of characters of some alphabet. This approach is not related to the content of the text. The amount of information in this case is called information volume of the text, which is proportional to the size of the text - the number of characters that make up the text. Sometimes this approach to measuring information is called the volumetric approach.

Each character of the text carries a certain amount of information. He's called symbol information weight. Therefore, the information volume of the text is equal to the sum of the information weights of all characters that make up the text.

Here it is assumed that the text is a consecutive string of numbered characters. In formula (1) i 1 denotes the informational weight of the first character of the text, i 2 - the informational weight of the second character of the text, etc.; K- text size, i.e. the total number of characters in the text.

All the many different characters used to write texts, is called alphabetically. The size of the alphabet is an integer called the power of the alphabet. It should be borne in mind that the alphabet includes not only the letters of a certain language, but all other characters that can be used in the text: numbers, punctuation marks, various brackets, spaces, etc.

Determining the information weights of symbols can occur in two approximations:

1) under the assumption of equal probability (same frequency of occurrence) of any character in the text;

2) taking into account the different probability (different frequency of occurrence) of various characters in the text.

Approximation of equal probability of characters in a text

If we assume that all characters of the alphabet in any text appear with the same frequency, then the information weight of all characters will be the same. Let N- power of the alphabet. Then the proportion of any character in the text is 1/ N th part of the text. According to the definition of probability (cf. ) this value is equal to the probability of the occurrence of a character in each position of the text:

p = 1/N

According to the formula of K. Shannon (see. “Measurement of information. Content approach”), the amount of information a symbol carries is calculated as follows:

i = log2(1/ p) = log2 N(bit) (2)

Therefore, the informational weight of the symbol ( i) and the cardinality of the alphabet ( N) are interconnected by the Hartley formula (see “ Measurement of information. Content approach” )

2 i = N.

Knowing the informational weight of one character ( i) and the size of the text, expressed as the number of characters ( K), you can calculate the information volume of the text using the formula:

I= K · i (3)

This formula is a particular version of formula (1), in the case when all symbols have the same information weight.

From formula (2) it follows that at N= 2 (binary alphabet) the information weight of one character is 1 bit.

From the standpoint of the alphabetical approach to measuring information1 bit -is the informational weight of a character from the binary alphabet.

A larger unit of information is byte.

1 byte -is the information weight of a character from an alphabet with a power of 256.

Since 256 \u003d 2 8, then the connection between a bit and a byte follows from the Hartley formula:

2 i = 256 = 2 8

From here: i= 8 bits = 1 byte

To represent texts stored and processed in a computer, an alphabet with a capacity of 256 characters is most often used. Consequently,
1 character of such text "weighs" 1 byte.

In addition to the bit and byte, larger units are also used to measure information:

1 KB (kilobyte) = 2 10 bytes = 1024 bytes,

1 MB (megabyte) = 2 10 KB = 1024 KB,

1 GB (gigabyte) = 2 10 MB = 1024 MB.

Approximation of different probability of occurrence of characters in the text

This approximation takes into account that in a real text different characters occur with different frequencies. It follows that the probabilities of the appearance of different characters in a certain position of the text are different and, therefore, their information weights are different.

Statistical analysis of Russian texts shows that the frequency of the letter “o” is 0.09. This means that for every 100 characters, the letter “o” occurs on average 9 times. The same number indicates the probability of the letter “o” appearing in a certain position of the text: p o = 0.09. It follows that the information weight of the letter “o” in the Russian text is equal to:

The rarest letter in the texts is the letter “f”. Its frequency is 0.002. From here:

A qualitative conclusion follows from this: the information weight of rare letters is greater than the weight of frequently occurring letters.

How to calculate the information volume of the text, taking into account the different information weights of the symbols of the alphabet? This is done according to the following formula:

Here N- size (power) of the alphabet; n j- number of repetitions of character number j in the text; i j- information weight of symbol number j.

Alphabetical approach in the computer science course at the foundation of the school

In the informatics course at the primary school, students' acquaintance with the alphabetical approach to measuring information most often occurs in the context of computer representation of information. The main statement goes like this:

The amount of information is measured by the size of the binary code with which this information is represented.

Since any kind of information is presented in computer memory in binary form, then this definition is universal. It is valid for symbolic, numerical, graphic and sound information.

One character ( discharge)binary code carries 1bit of information.

When explaining how to measure the information volume of text in basic course informatics this question is revealed through the following sequence of concepts: alphabet-character binary code size-information volume of the text.

The logic of reasoning unfolds from particular examples to obtaining a general rule. Let there be only 4 characters in the alphabet of some language. Let's denote them:, , , . These characters can be encoded using four two-digit binary codes: - 00, - 01, - 10, - 11. Here, all placement options of two characters by two, the number of which is 2 2 = 4, are used. A 4-character alphabet is equal to two bits.

The next special case is an 8-character alphabet, each character of which can be encoded with a 3-bit binary code, since the number of placements of two characters in groups of 3 is 2 3 = 8. Therefore, the information weight of a character from an 8-character alphabet is 3 bits. Etc.

Generalizing particular examples, we obtain a general rule: using b- bit binary code, you can encode an alphabet consisting of N = 2 b- characters.

Example 1 To write the text, only lowercase letters of the Russian alphabet are used and a “space” is used to separate words. What is the information volume of a text consisting of 2000 characters (one printed page)?

Decision. There are 33 letters in the Russian alphabet. Reducing it by two letters (for example, “ё” and “й”) and entering a space character, we get a very convenient number of characters - 32. Using the approximation of equal probability of characters, we write the Hartley formula:

2i= 32 = 2 5

From here: i= 5 bits - informational weight of each character of the Russian alphabet. Then the information volume of the entire text is equal to:

I= 2000 5 = 10,000 bit

Example 2 Calculate the information volume of a text with a size of 2000 characters, in the record of which the alphabet of a computer representation of texts with a capacity of 256 is used.

Decision. In this alphabet, the information weight of each character is 1 byte (8 bits). Therefore, the information volume of the text is 2000 bytes.

In practical tasks on this topic, it is important to develop the skills of students in converting the amount of information into different units: bits - bytes - kilobytes - megabytes - gigabytes. If we recalculate the information volume of the text from example 2 into kilobytes, we get:

2000 bytes = 2000/1024 1.9531 KB

Example 3 The volume of the message containing 2048 characters was 1/512 of a megabyte. What is the size of the alphabet with which the message is written?

Decision. Let's translate the information volume of the message from megabytes to bits. To do this, we multiply this value twice by 1024 (we get bytes) and once by 8:

I= 1/512 1024 1024 8 = 16,384 bits.

Since this amount of information is carried by 1024 characters ( To), then one character accounts for:

i = I/K= 16 384/1024 = 16 bits.

It follows that the size (power) of the used alphabet is 2 16 = 65 536 characters.

Volumetric approach in the course of computer science in high school

Studying computer science in grades 10–11 at the basic general education level, students can leave their knowledge of the volumetric approach to measuring information at the same level as described above, i.e. in the context of the amount of binary computer code.

When studying computer science at the profile level, the volumetric approach should be considered from more general mathematical positions, using ideas about the frequency of characters in a text, about probabilities and the relationship of probabilities with the information weights of symbols.

Knowledge of these issues is important for a deeper understanding of the difference in the use of uniform and non-uniform binary coding (see. “Information Encoding”), to understand some data compression techniques (see. “Data Compression”) and cryptographic algorithms (see "Cryptography" ).

Example 4 In the alphabet of the MUMU tribe, there are only 4 letters (A, U, M, K), one punctuation mark (dot) and a space is used to separate words. It was calculated that the popular novel "Mumuka" contains only 10,000 characters, of which: letters A - 4000, letters U - 1000, letters M - 2000, letters K - 1500, dots - 500, spaces - 1000. How much information contains book?

Decision. Since the volume of the book is quite large, it can be assumed that the frequency of occurrence in the text of each of the symbols of the alphabet calculated from it is typical for any text in the MUMU language. Let's calculate the frequency of occurrence of each character in the entire text of the book (i.e. probability) and the information weights of the characters

The total amount of information in the book is calculated as the sum of the products of the information weight of each symbol and the number of repetitions of this symbol in the book:

Share