Page encoding examples and encoding errors. HTML encoding

One of the most common problems faced by a beginner webmaster(and not only beginners), it is website encoding issues. Even I constantly appear when creating sites " abracadabra". But, fortunately, I know perfectly well how to solve this problem, so I put everything in order within a few seconds. And in this article I will try to teach you as quickly solve coding problems on the site.

The first thing to note is that all the problems with the appearance of "abracadabra" are associated with a mismatch between the encoding of the document and the encoding set by the browser. Let's say the document is windows-1251, and the browser for some reason exposes UTF-8. And the following reasons can already be the source of such a discrepancy.

First reason

Invalid meta tag content-type. Be careful, it should always contain the encoding in which your document is written.

The second reason

It seems that the meta tag is written the way you want, and the browser exposes exactly what you want, but for some reason there is still a problem with the encoding. Here, almost certainly, the fact that the document itself has a different encoding is to blame. If you are working in Notepad++, then at the bottom right there is the name of the encoding of the current document (for example, ANSI). If you put in the meta tag UTF-8, and the document itself is written in ANSI, then convert to UTF-8(via menu " Encodings" and paragraph " Convert to UTF-8 without BOM").

Third reason

Fourth reason

And finally, the last popular reason is database encoding problem. First, make sure that all your tables and fields are written in the same encoding, which matches the encoding of the rest of the site. If this does not help, then immediately after connecting in the script, run the following request:

SET NAMES "utf8"

Instead of " utf8" may be a different encoding. After that, all data from the database should be output in the correct encoding.

In this article, I hope I have covered at least 90% of the problems associated with the appearance of "abracadabra" on the site. Now you should deal with such a popular and simple problem as incorrect encoding in a jiffy.

In the first chapter of this tutorial, on the general construction of an html document, I talked about the fact that all html documents should have this code template:

- the beginning of the document
- the beginning of the head
- head closure
- the beginning of the body
- body closure
- end of document

Where between tags information is indicated that is intended to be displayed on the screen in the form we need, and between the tags exclusively service information intended for search engines and browsers of certain users. So what is this information and why is it needed? I will answer, systematically and in portions in this chapter.

With tag </b> we are already familiar with, with the help of it we indicate the name of the document in the title of the page. Now a new tag <b><meta> </b>(does not require a closing tag) with the help of it, we will indicate this very service information on our page.</p> <b><meta> </b> The tag has the following attributes: <ul><li><b>http-equiv</b>- tells the browser how to process the main content of the document, more precisely, on the basis of what data.</li><li><b>name</b>- informational name. (used in conjunction with the attribute <b>content</b>)</li><li><b>content</b>- information content associated with the meta name ( <b>name</b>)</li> </ul><p>Now let's get to the heart of the matter with examples.</p> <h2>Character encoding and language</h2> <p>An example (very necessary and important):</p> <p> <b><meta http-equiv="Content-Type" Content="text/html; Charset=Windows-1251"> </b></p> <p>First, I'll tell you why this line is needed in the head of the html document. This entry tells the browser the encoding in which this page was written - the document format and keyboard layout, in this case it is Cyrillic for Windows. If this line is not written in the page header, then there is a high probability that all the text on your page will be displayed in the form of "hieroglyphs" incomprehensible to a person for different users of certain browsers. Of course, the user can apply the command in the browser to such a document <b>View->Encoding->Cyrillic</b>, but he may not know about this function, and why bother a person with this action.</p> <p>Now let's analyze our record by "syllables": <br><b><meta http-equiv="Content-Type" </b>- indicate that in this meta tag we will deal with <b>content-type</b>- type of content <br><b>Content="text/html;</b>- namely his text <br><b>Charset=Windows-1251"></b>- document for Windows - Cyrillic where <b>1251 </b> keyboard layout encoding, so for example English keyboard will be given <b>Charset=Windows-1252</b></p> <p>Currently, advanced webmasters recommend using encoding <b>UTF8</b></p> <p>That is, write in the head of the document like this:</p> <p> <b><meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </b></p> <p> <b><meta http-equiv="Content-Language" Content="ru"> </b></p> <p>This line says that the language <b>language</b> document is Russian <b>Content="ru"</b></p> <p>Setting the language and keyboard layout incorrectly can lead to unfortunate consequences.</p> <h2>Document Information</h2> <p> <b><meta name="author" Content="Остап Бендер"> </b> <br><b><meta name="copyright" Content=""Рога и копыта" Остап Бендер"> </b></p> <p>These meta descriptors are intended to be used as a copyright statement directly in the head of the html code, so <b>name="author"</b> specifies the name of the author of the page, and <b>name="copyright"</b> copyright (copyright), which may indicate the surname, name, patronymic of the author of the site, the name of the company, brand .. etc. In addition, by including such a description in the title of the document, you will greatly simplify the task of the search engine when searching for your site by the name of the author, company name, brand...</p> <p> <b><meta name ="Generator" Content="Microsoft Notepad"> </b></p> <p>If you want, you can specify with which html editor this page was written.</p> <h2>Page description and keywords</h2> <p> <b><meta name="description" Content="We purchase horns and hooves at competitive prices!"> </b></p> <p><b>Description</b>- a short description of the page. This description is often used by search engines to display in the search results, for any request, information about the site and its purpose.</p> <p> <b><meta name="keywords" Content ="рога, копыта, рожки, рог, копыто, копытце, закупка, покупка, приобретение, выгодно, продать, купить, сбыть, реализовать, корова, бык, коровьи, бычьи, оплата, деньги, наличные, цена, цене"> </b></p> <p><b>keywords</b>- the keywords of the web page, again intended for search engines.</p> <p>Imagine that you are looking for a site in any search engine with information about where you can sell the same horns and hooves :) What words and phrases will you enter in the "Search" line? Well, probably something like: "Where to sell cow horns?" or "Sell hooves at a bargain price" So if you define keywords and predict the thoughts of a potential visitor, you can hope that one or another search engine will give a link to your site in the first lines of the search result. Of course, entering this meta description is not a guarantee that your site will take first place in the search for these words, but still you should not neglect it. However, this is a separate topic for discussion.</p> <p>Please note that the description <b>description</b> should not exceed 200 characters in length, and keywords <b>keywords</b> 1000 characters, otherwise it may adversely affect the promotion of your site in the TOP search engines.</p> <h2>Address</h2> <p> <b><meta name="Publisher-Email" Content="Ваш_e-mail@сервер.домен"> </b> <br><b><meta name="Publisher-URL" Content="http://www.Ваш_сайт/"> </b></p> <p>I think it's understandable .. here is the address of your mailbox <b>Publisher Email</b> and website address <b>Publisher-URL</b></p> <h2>Page update</h2> <p> <b><meta name ="revisit-after" Content="15 days"> </b></p> <p>If a certain page on your site implies constant updating and/or addition of informational content, then it would be good to include this description in the title of this page. Such an introduction will allow the robot to visit your site in a timely manner and index its content. In our example, we stated that we are going to update the content on the page at least once every 15 days, you can be sure the robot program will take note of your plans and will come to visit you once every fifteen days, in order to check to see if anything has changed.</p> <h2>Document expiration time and cache</h2> <p>In order to speed up the loading of the page, as well as save traffic, modern browsers save the pages visited by the user in the cache (on the hard disk), and when they visit again, they are loaded not from the server, but directly from the cache. In fact, such a function is nice .. but there is one "but", the fact is that the browser may display outdated information on any page. Imagine, for example, your site is a kind of periodic online news publication, and the user will receive, instead of the latest news, already outdated information, the one that is stored in his cache !! and without understanding what the "trouble" is, it will take your site for a "dead" abandoned and not updated by anyone.</p> <p>In order to force the browser to load this or that page not from the hard disk, but from the server, a meta tag with this syntax is needed, which indicates the day of the week, day, month, year, time (hh: mm: ss) and time zone ( <b>GMT+03:00</b>- Moscow time + three hours). The day of the week and time of day can be omitted. Now, when the page is read by the browser, the page will be loaded from the server if the specified date and time has arrived or expired, and vice versa from the cache if the specified time has not yet arrived.</p> <p>Below, just in case, are tables of abbreviations from English words for months and days of the week</p> <td valign="top"> </td> <p>Attribute <b>content</b> value can be assigned <b>"0" <meta http-equiv="Expires" content="0"> </b> in this case the page will always be loaded from the server.</p> <p>And one more thing .. some search robots may refuse to index a document with a knowingly outdated date. - Don't tempt fate.</p> <p> <b><meta http-equiv="pragma" content="no-cache"> </b></p> <p>And such an entry will completely prohibit the browser from caching this page.</p> <h2>Robot Commands</h2> <p> <b><meta name="robots" content="Index,follow"> </b></p> <p>This meta tag is designed to give the search robot a particular command.</p> <p>List of possible commands for the robot:</p> <ul><li><b>Index</b>- index the page</li><li><b>Noindex</b>- do not index the page</li><li><b>Follow</b>- trace hyperlinks on the page</li><li><b>nofollow</b>- do not track hyperlinks on the page</li><li><b>All</b>- index the page and track hyperlinks on the page (default)</li><li><b>None</b>- do not index the page and do not track hyperlinks on the page</li> </ul><h2>Automatic transition to another page</h2> <p> <b><meta content="10; URL=http://www.mysite/index.html"> </b></p> <p>If suddenly, for some reason, you decide to change the URL address of your site, then it would be good to leave a page like this in the old place:</p> <p> <html> <br> <head> <br> <meta http-equiv="Content-Type" Content="text/html; Charset=Windows-1251"> <br><b><meta content="10; URL=http://www.mysite/index.html"> </b> <br> <title>Forwarding



The site address has been changed, after 10 seconds your browser will be automatically redirected to the new address:
http://www.mysite.ru/
Click here to complete the transition immediately.
We apologize for any inconvenience caused.


Let's analyze and comprehend the line from the example:

meta- Refresh (restore) tells the browser that this page needs to be refreshed
content="10;- update after a given number of seconds (ten in our case)
URL=http://www.mysite/index.html"- the address of the new / different page to go to.

But if in the title Refresh To miss the URL address, as shown in the example, then the browser will constantly update the content of this page every 30 seconds (well, or not 30 .. how many write down after so much and will ..).

This method is widely used in news feeds, where information flows, so to speak, and requires constant updating.

Effects when clicking on a link


These headings create visual effects when moving from one page to another.

  • Page-Enter- Page appearance effect
  • Page Exit- Page fade effect

In which:

  • duration- effect duration in seconds
  • transition- One of the numbers of proposed effects (from 0 to 23) listed in the table:
NumberEffect DescriptionNumberEffect Description
0 Rectangles inside12 Dissolution
1 Rectangles out13 Vertical panorama inside
2 Circle inside14 Vertical pan out
3 Circle out15 Horizontal panorama inside
4 Influx up16 Horizontal outward panorama
5 rush down17 Corners left - down
6 Influx to the right18 Corners left - up
7 Influx to the left19 Corners right - down
8 Vertical blinds20 Corners to the right - up
9 Horizontal blinds21 Random horizontal stripes
10 Horizontal steps22 Random vertical stripes
11 Vertical steps23 Random effect selection

page1.html file





Page Transition Effects



On a note:


Transition effects from one page to another do not work in all browsers.




"Jump"


page2.html file





Page Transition Effects



On a note:


The effects of opening and closing web pages will only be visible when navigating
from one page to another or using the "back" "forward" buttons.
When opening the page for the first time, as well as during the reload
transition effects will not be visible.


Click on "Go" to go to the next page
and evaluate the effect of transition from one page to another.


"Jump"


    Let me remind you once again that meta tags should be used skillfully and competently, especially when it comes to commands for the robot and character encoding, otherwise all your work may go down the drain ..

    header Refresh(automatically jump to another page) can be used not quite standardly. Some authors use it to create a kind of "presentation" slide show, where changing pages are presentation frames. Imagine a person comes to such a site and then he says "Lean back in your chair and relax .." :) and then pictures, graphics, texts went by themselves .. and the last page is a dead end where the user takes the site "into his own hands" or maybe close on the first one. Just always remember the golden rule of the webmaster: The main thing is not to overdo it!

How to set the site encoding so that the browser can correctly determine it, and not show you krakozyably, such as:

Р-аказать сайт Сѓ РSR°СЃ - это создать сайт недорого Рё РєРхачесЁµС‚

In HTML, a tag is used to indicate the encoding:

The most common types of encoding for the Russian language are transmitted in the document header:

Windows-1251 - Cyrillic (Windows).
KOI8-r - Cyrillic (KOI8-R)
cp866 - Cyrillic (DOS).
Windows-1252 - Western Europe (Windows).
Windows-1250 - Central Europe (Windows).
UTF-8 - two byte encoding

Now consider specifying the default encoding through the .htaccess file (if this file does not exist, you need to create it, the file name starts with a dot)

AddDefaultCharset sets the default character table (encoding) for all rendered pages on the Apache web server

Just add 1 line

AddDefaultCharset UTF-8

AddDefaultCharset WINDOWS-1251

Just one line, and the browser will give the page to the user, in the correct encoding, regardless of their preferences. The site encoding will be the same for all browsers.

When uploading a file to the server, conversion is possible. We indicate that all received files will have windows-1251 encoding, for this we will write.

In this article, I will try to dot the "and" (as well as the "i") in the choice of encoding for the generated HTML page.

When I first started doing website building, I constantly had problems because of these encodings. You save the HTML page, upload it to the server, open it, bang, and there are bugs. Well hello, here we go.

Or in the debugging environment (for example, the local development environment ""), everything is fine, but from the hosting again they, damned krakozyabry, are brazenly looking at me.

How much torment was with the engines. Suddenly, it is not clear why, native Russian letters turn into ...

Now we will deal with this matter in detail and you will clearly know what encoding to save the html page in and through which tools.

To strengthen our mutual understanding, let us define the concept encoding. So here it is encoding is a table of correspondence between machine codes and alphabetic characters. There is some sequence of machine characters that a smart computer, in accordance with the selected code table, replaces with letters that we understand.

In the 90s of the last century (what an antiquity, but I still remember the 1991 calendar on the wall) there were 4 encodings for the PC and one more, its own, for the Mac. The irony of fate lies in the fact that in all these encodings, Latin characters were matched to machine codes using the same algorithm, but regarding the Cyrillic alphabet, each of the encodings had its own opinion.

All this confusion led to the appearance of krakozyabry. For example, if the word " Question”, typed in windows-1251 encoding, display with KOI8-R encoding, the word “ bNO».

Thank God, the 1990s are already far behind us, and out of five deception codes, only 2 are normal. But this is quite enough for a novice webmaster to get lost in two pines. Don't worry, now I'll take you out of this forest!

At the moment the choice for HTML document encodings stands between windows-1251 and utf-8. And now attention: utf-8 is much richer, more powerful and the future lies with it. So we will save our HTML files in utf-8.

Justify my words;). UTF-8 contains characters such as ↓. And in windws-1251, instead of these characters, this is what: > . And in utf-8 there is a sign "euro"; utf-8 also allows you to combine a bunch of various specific characters used in languages ​​such as Georgian, Hebrew, Chinese, Japanese in one HTML file; and also utf-8in HTML encodings- this is a good practice.

I hope I convinced you and you will use Unicode (by the way, "utf-8" and "Unicode" are synonyms or, to be more precise, utf-8 is one of the Unicode family encodings that has gained popularity among web developers) .

Now let's take a closer look at the file transcoding tools that I recommend you use, dear reader.

Tools for working with HTML file encodings

Actually, there are only three of them:

  • PSPad. Free text editor, my favorite.
  • . Another good text editor and also free.
  • dreamweaver. Well, you are familiar with Dreamweaver from my .

Upload some HTML file to PSPad. And how can we understand what kind of encoding the loaded test subject has? Very simply in the status bar (below) everything is clearly written.

Encoding open html file windows-1251

And now, creating a new HTML document, let's take care of its encoding.

Going to the menu of my favorite PSPad. We are interested in the item Format. In it, we will put a checkmark in front of the utf-8 encoding.

And so the encoding of the future file is windows-1251

Now about how change html file encoding. Yes, it's very simple:

You need to click on the required encoding in the menu item Format and the encoding will change. After that, save the file, it is recoded, the job is done.

Concerning Notepad++ everything is very similar to the situation described above. Only to work with encodings you need to use the menu item Encodings.

The whole difference lies in the fact that in the case of Notepad ++ there are menu items specially designed for converting encodings. Convert... (Superfluous in my opinion, everything is simpler in PSPad and that's why I use it). Accordingly, it is on them that you need to click if you want to change the encodings of our HTML file.

Among other things, when saving in utf-8, we have a choice: without BOM or with BOM. We as webmasters need to use encoding UTF-8 (no BOM).

Here's what Wikipedia will answer us on the question " what is BOM»

To determine the Unicode representation format in a text file, a technique is used by which the U + FEFF character (zero-width non-breaking space), also called the Byte Order Mark, BOM, is written at the beginning of the text. This method makes it possible to distinguish between UTF-16LE and UTF-16BE because the U+FFFE character does not exist. It is also sometimes used to denote the UTF-8 format, although the concept of endianness does not apply to this format.

If you read the above text 10 times, scratch your head, it becomes clear: for utf-8 BOM we don't need. In addition, if you save the file with the php script in the encoding utf-8 with BOM, then it will not work, because the handler will not understand what kind of nonsense this is written at the beginning of the script file (I mean that same non-breaking space with zero width).

Well, well, it remains to take a close look at dreamweaver.

When creating a new file, pay attention to what encoding it will be created in. To do this, in the window for creating a new document File → New (Ctrl+N) use the button Preferences...

And see what is set as the default encoding:

The default encoding of the generated HTML file in Dreamweaver

Transcode open HTML file in Dreamweaver possible in dialogue Page Properties, which is launched from the menu ModifyPage Properties (Ctrl + J).

Select the required encoding, press OK and that's it, the task of transcoding is completed (but the BOM is still unnecessary, do not check the box).

Determination of the encoding by browsers

So, our HTML file is saved in the encoding we have chosen. Now let's deal with the question: How does the browser know about the encoding used in this HTML file?

There are three options here:

1. We ourselves tell the browser what encoding is set for this HTML file. This is done using the META tag.

In the above example, the browser is instructed that the downloaded HTML file is saved in the encoding utf-8.

If the HTML file is saved in windows-1251 encoding, then:

By the way, when transcoding files, do not forget to change the directives in the META tag to the actual ones. Dreamweaver, when changing the encoding, does this automatically, but in other text editors, you yourself need to match the applied encoding and the META tag directive.

The full HTML looks like this (I quote it to understand the question “where is the META tag with the encoding directive indicated” attention to the 4th line):

Untitled Document Well, etc.

2. Using the .htaccess file. Sometimes the server forcibly passes headers for downloaded HTML files and tells the browser the default encoding. In this case, the browser does not pay attention to the directives in the META tag, but displays the HTML file in the encoding reported by the server. In order for the file to be loaded in the encoding that you need (often the hosting forcibly specifies the windows-1251 encoding), a file called “.htaccess” is created in the root of the hosting directory.

This file is intended for additional server configuration. The effect of .htaccess directives applies to all files and subdirectories that are located in the directory where you saved the .htaccess file.

You can create this file, for example, in Total Commander by pressing the hot key combination Shift+F4 and specifying the name of the created file .htaccess. Next, in the text editor, directives for additional default encoding settings are specified.

For utf-8 encoded HTML files in .htaccess you need to write one line:

AddDefaultCharset UTF-8

For HTML files in Windows-1251 encoding:

AddDefaultCharsetWindows-1251

If your hosting is smart-wise and does not pay attention to these directives, then you can try:

Charsetdisable on AddDefaultCharset Off

If this does not work, then just ask your hoster what you should do to disable the default encoding :). It all depends on the specific server settings of the hosting provider.

3. PHP instruction specifying the default encoding. In the file that needs to be displayed in the desired encoding, regardless of the hosting provider's server settings, a directive with a php code is indicated at the very beginning:

This php code will send a server header with the browser's default encoding. In the above example, utf-8 encoding will be used to display the page.

Against such scrap, usually, there are no tricks in the server settings of the hosting provider.

I want to note that in order to process php instructions by the server, the html file must have the extension .php(for example index.php).

Do you have any other coding questions? Write in the comments. We need to solve these problems once and for all 🙂

Almost every newcomer to the field of web development, sooner or later, encounters coding problems in their projects. And then, as per the written script, the bombardment of forums begins with questions about how to defeat the hated " krakozyabry". The vast majority of problems have long been known and are treated quite easily, you just need to know " where does it hurt and what pill to take". Therefore, I propose to analyze the most popular errors due to which this problem appears and it is possible that my recommendations will save you from further collisions with them.

First, I strongly recommend that all documents were in the same encoding and the database, namely the fields with string data, had the same encoding. It is set when the database is created, or you can specify a comparison for each individual field. If you create a database using phpMyAdmin, then there should be no difficulties: the "Databases" tab > in the field under "Create a database" enter the name of your future database > next to the "Comparisons" drop-down list. If you create a database with an sql query, then write something like this:

CREATE DATABASE IF NOT EXISTS `my_db_name` CHARACTER SET utf8 COLLATE utf8_general_ci;

The choice of encoding is up to you, but I would advise you to choose for documents " UTF-8 without BOM" and comparison for base " utf8_general_ci" (unicode multilingual, case insensitive). Just do not forget to play it safe and make a dump before manipulating the database! I won’t describe here what BOM is, but if it’s very figurative and on the fingers, then this is such an invisible marker that was planned to distinguish between UTF-16LE and UTF-16BE encodings, but for some reason turned out to be unclaimed and now interferes with the web -developers live in peace;) BOM looks like a U+FEFF symbol and settles at the beginning of the document. Why is it still UTF-8? Here are at least a couple of reasons... You can easily display both Cyrillic and a quote from Al-Mutanabbi's poems or Chinese characters on the screen. This is because in the same windows-1251 encoding (cp1251) there are only 256 characters, while in UTF-8 there are about a hundred thousand of them, plus special characters, pictograms, icons, etc. If you are going to use ajax requests on your site, then this also adds a plus to the UTF-8 encoding, because the XMLHttpRequest object is friends with this encoding, and you will have to pervert with others and sometimes unsuccessfully. The same sitemap (sitemap.xml) that is used for indexing by search engines only works if this file is created with UTF-8 encoding. In addition, this encoding is the standard for many PHP functions to work, and is the standard recommended by the W3C.

When creating a new document, everything is clear, but what about an existing one in which it is desirable to change the encoding? One of the easiest ways is to open a document in Notepad++ , select " Encodings" and in the list " Convert to UTF-8 without BOM". Next, change the meta tag with the encoding definition:

And for php files, you can set the appropriate header, but only if the file is not included in another document, where such a header will already be sent earlier. This applies to both the header in the meta tag and the one sent by the header function:

Header("Content-Type: text/html; charset=utf-8");

We check the result in the browser. There may be several options here:

  1. Everything is displayed fine and the issue is closed
  2. Statically written data is displayed normally, but the data from the database is still the same "crazy"
  3. Nothing has changed and the encoding remains crooked

Let's start with the last point. Happy owners of dedicated servers or VPS/VDS can change the encoding for the directive default_charset in the php.ini configuration file. For those who do not have access to php.ini or have, but need to change the encoding for only one site, you can use the .htaccess file by writing the following into it:

# in principle, the line below is enough: AddDefaultCharset UTF-8 # but sometimes, additional settings may be required: DefaultLanguage ru php_value default_charset "utf-8"

The .htaccess file is located at the root of your site. If you did not find it there, then we create it ourselves. In a regular notepad, create a document> " Save as"> select file type" All files"> in the "File name" field, write only the dot and the extension " .htaccess".

Let's move on to the second point - if the database was transferred to the desired encoding, but the data from it is displayed crookedly on the page. First, you need to make sure that the characters in the database itself are displayed normally. If the encoding did not "float" there, then you can either appeal to the configuration files again, or make a request immediately after connecting to the database:

SET NAMES utf8;

* I write the text of the request myself, but since I don’t know what extension you use to work with MySQL, I’ll show you several options:

// for legacy mysql_* $db = mysql_connect("localhost", "username", "password"); mysql_select_db("db_name", $db); mysql_query("SET NAMES utf8"); // for PDO and php versions below 5.3.6 $dbh = new PDO("mysql:host=localhost;dbname=db_name", "username", "password"); $dbh->exec("SET NAMES utf8"); // for PDO and php versions 5.3.6 and newer, can be specified directly when creating an object $dbh = new PDO("mysql:host=localhost;dbname=db_name;charset=utf8", "username", "password"); // or $db = new PDO("mysql:host=localhost;dbname=db_name", "username", "password", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")); // for MySQLi $mysqli = new mysqli("localhost", "username", "password", "db_name"); $mysqli->set_charset("utf8");

Since I raised the issue of "outdated mysql_*", I want to draw your attention to the text highlighted in red in the php documentation. Worth thinking...
If you had one of the standard problems, then by following some or all of the above steps, the issue with the encoding will be resolved positively. But I would also like to mention some functions that may come in handy in non-standard situations. You can read more about them in the documentation, and I will just give a couple of examples without going into details:

Mb_internal_encoding() With this function, we can set or get the current script encoding: mb_internal_encoding("UTF-8"); // set echo mb_internal_encoding(); // no argument - get mb_http_input() and mb_http_output() Two functions that determine, set, or get the HTTP request or output character encoding: print_r(mb_http_input("I")); // determine the encoding of the http request input data mb_http_output("UTF-8"); // set encoding for http output echo mb_http_output(); // get the current character encoding of the http-output iconv() The function converts the characters of the string to the desired encoding: echo iconv("utf-8","cp1251","PџСЂРёРІРµС‚, РјРёСЂ!"); // Hello World! mb_convert_encoding() The function is similar to iconv(), but in my opinion it is better, because works more adequately. echo mb_convert_encoding("Привет, РјРёСЂ!","cp1251","utf-8"); // Hello World!

And in general, do not forget about the analogues of functions for working with multibyte strings. Most often, they have the same name, but with the prefix mb_. It's easy enough to feel the difference. Take, for example, the functions strlen() and mb_strlen() and conduct an experiment by measuring the length of the string:

// set internal encoding mb_internal_encoding("utf-8"); // no difference for latin characters echo strlen("incode"); // 6 echo mb_strlen("incode"); // 6 // But with Cyrillic it gives out - pichalka echo strlen("incode"); // 10 echo mb_strlen("incode"); // 5

Maybe someone does not need to explain this phenomenon, but for beginners I will explain: Cyrillic is encoded in two bytes, and strlen() counts exactly the number of bytes in a string, not the number of letters. So it turns out that five Cyrillic characters multiplied by two - we get 10. Chinese characters, if I'm not mistaken, are generally encoded in three bytes, so in the future for such cases, so that there are no misunderstandings, use the appropriate functions.

I repeat that these solutions are for common cases and in the vast majority, they solve the problem. But if you have a situation where all these methods did not work, then write here, we will try to figure it out together and supplement the article with a new "recipe for a headache";) Let me take my leave.

Share