Educational program on typing in programming languages. Basic programming principles: static and dynamic typing Strong and weak typing

The simplicity of typing in the OO approach is a consequence of the simplicity of the object computational model. Omitting the details, we can say that when the OO system is executed, only one kind of events occurs - a feature call:


denoting the execution of an operation f over an object attached to x, passing the argument arg(multiple arguments are possible, or none at all). Smalltalk programmers talk in this case about "passing to an object x messages f with argument arg", but this is just a difference in terminology, and therefore it is insignificant.

That everything is based on this Basic Construct explains in part the sense of the beauty of OO ideas.

Those abnormal situations that may arise during the execution follow from the Basic Construction:

Definition: type violation

A runtime type violation, or just a type violation for short, occurs at the time of the call x.f (arg) where x attached to object OBJ if either:

[x]. there is no component corresponding to f and applicable to OBJ,

[x]. there is such a component, however, the argument arg is unacceptable for him.

The problem with typing is to avoid situations like this:

The problem of OO systems typification

When do we find that a type violation can occur during the execution of an OO system?

The key word is when... Sooner or later you will realize that there is a type violation. For example, an attempt to execute a Torpedo Launch component for an Employee object will not work and will fail. However, you may prefer to find bugs as early as possible rather than later.

Static and dynamic typing

While intermediate options are possible, two main approaches are presented here:

[x]. Dynamic typing: wait for the moment of completion of each call and then make a decision.

[x]. Static typing: Based on a set of rules, determine from the source text if type violations are possible during execution. The system is executed if the rules guarantee that there are no errors.

These terms are easy to explain: with dynamic typing, type checking occurs at runtime (dynamically), while with static typing, type checking is performed on the text statically (before execution).

Static typing implies automatic checking, which is usually left to the compiler. As a result, we have a simple definition:

Definition: statically typed language

An OO language is statically typed if it comes with a set of consistent compiler-checked rules to ensure that the execution of the system does not lead to type violation.

In the literature, the term " strong typing "( strong). It conforms to the ultimatum nature of the definition, which requires no type violation at all. Possible and weak (weak) forms of static typing, in which the rules eliminate certain violations, without eliminating them entirely. In this sense, some OO languages ​​are statically weakly typed. We will fight for the strongest typing.

Dynamically typed languages, known as untyped, do not have type declarations, and any value can be attached to entities at runtime. Static type checking is not possible in them.

Typing rules

Our OO notation is statically typed. Its type rules were introduced in previous lectures and boil down to three simple requirements.

[x]. When declaring each entity or function, its type must be specified, for example, acc: ACCOUNT... Each subroutine has 0 or more formal arguments, the type of which must be specified, for example: put (x: G; i: INTEGER).

[x]. In any assignment x: = y and for any subroutine call in which y is the actual argument for the formal argument x, source type y must be compatible with the target type x... The definition of compatibility is based on inheritance: B compatible with A, if it is a descendant of it, supplemented with rules for generic parameters (see Lecture 14).

[x]. Call x.f (arg) requires that f was a base class component for the target type x, and f must be exported to the class in which the call appears (see 14.3).

Realism

Although the definition of a statically typed language is quite accurate, it is not enough - informal criteria are needed when creating typing rules. Consider two extreme cases.

[x]. Perfectly correct language, in which every syntactically correct system is correct with respect to types. Type declaration rules are not needed. Such languages ​​exist (imagine the Polish notation for an expression with addition and subtraction of integers). Unfortunately, no real universal language meets this criterion.

[x]. Completely incorrect language which is easy to create by taking any existing language and adding a typing rule that makes any the system is incorrect. By definition, this language is typed: since there are no systems that match the rules, no system will cause a type violation.

We can say that languages ​​of the first type fit but useless, the latter may be useful, but not useful.

In practice, we need a type system that is both usable and useful: powerful enough to meet computational needs and convenient enough not to force us to complicate things to satisfy typing rules.

Let's say that the language realistic if it is suitable for use and useful in practice. In contrast to the definition of static typing, which gives a categorical answer to the question: " Is X statically typed?", the definition of realism is partly subjective.

In this lecture, we will make sure that the notation we are proposing is realistic.

Pessimism

Static typing leads by its nature to "pessimistic" politics. An attempt to guarantee that all calculations do not lead to failures, rejects calculations that could have ended without error.

Consider a regular, non-object, Pascal-like language with various types REAL and INTEGER... When describing n: INTEGER; r: Real operator n: = r will be rejected as in violation of the rules. Thus, the compiler will reject all of the following statements:


If we enable their execution, we will see that [A] will always work, since any number system has an exact representation of the real number 0,0, which can be unambiguously translated into 0 integers. [B] will almost certainly work as well. The result of action [C] is not obvious (do we want to get the total by rounding or discarding the fractional part?). [D] will do its job, just like the operator:


if n ^ 2< 0 then n:= 3.67 end [E]

which includes the unreachable assignment ( n ^ 2 is the square of the number n). After replacement n ^ 2 on the n only a series of starts will give the correct result. Assignment n a large floating point value that cannot be represented by an integer will result in a failure.

In typed languages, all these examples (working, not working, sometimes working) are mercilessly interpreted as violations of the typing rules and rejected by any compiler.

The question is not we will whether we are pessimists, but whether how much we can afford to be pessimistic. Back to the requirement of realism: if the type rules are so pessimistic that they prevent the computation from being easy to write, we will reject them. But if the achievement of type safety comes with a small loss of expressive power, we'll accept them. For example, in a development environment that provides rounding and integer highlighting functions - round and truncate, operator n: = r is considered invalid rightly because it forces you to explicitly write the real-to-integer conversion instead of using the default ambiguous conversions.

Static typing: how and why

While the benefits of static typing are obvious, it's a good idea to talk about them again.

Benefits

We listed the reasons for using static typing in object technology at the beginning of the lecture. These are reliability, ease of understanding and efficiency.

Reliability due to the detection of errors that otherwise could manifest themselves only during work, and only in some cases. The first of the rules, forcing the declaration of entities, as well as functions, introduces redundancy into the program text, which allows the compiler, using the other two rules, to detect inconsistencies between the intended and real use of entities, components, and expressions.

Early detection of errors is also important because the longer we delay finding them, the more the cost of fixing them will increase. This property, intuitively understood by all professional programmers, is quantitatively confirmed by the well-known works of Boehm. The dependence of the cost of fixing on the time it takes to find errors is shown in a graph based on data from a number of large industrial projects and experiments carried out with a small manageable project:

Fig. 17.1. Comparative costs of bug fixes (, published with permission)

Readability or Ease of understanding(readability) has its advantages. In all of the examples in this book, the appearance of a type on an entity gives the reader information about its purpose. Readability is extremely important during the maintenance phase.

Finally, efficiency can determine the success or failure of object technology in practice. In the absence of static typing, to execute x.f (arg) it can take as long as you like. The reason for this is that at runtime, not finding f in the base class of the target x, the search will continue in her descendants, and this is a sure road to inefficiency. You can alleviate the problem by improving the search for a component in the hierarchy. The authors of Self have done a great job trying to generate the best code for the dynamically typed language. But it was static typing that allowed such an OO product to come close or equal in efficiency to traditional software.

The key to static typing is the idea already expressed that the compiler generating the code for the construct x.f (arg), knows the type x... Due to polymorphism, there is no way to unambiguously determine the appropriate version of the component f... But the declaration narrows down the many possible types, allowing the compiler to build a table that provides access to the correct f with minimal costs, - bounded constant the complexity of access. Additional optimizations performed static binding and inlining- are also facilitated by static typing, completely eliminating overhead where applicable.

Arguments for dynamic typing

Despite all this, dynamic typing has not lost its adherents, in particular among Smalltalk programmers. Their arguments are based primarily on the realism discussed above. They believe that static typing limits them too much, preventing them from freely expressing their creative ideas, sometimes calling it a "chastity belt."

One can agree with this reasoning, but only for statically typed languages ​​that do not support a number of features. It is worth noting that all the concepts associated with the concept of type and introduced in the previous lectures are necessary - the rejection of any of them is fraught with serious restrictions, and their introduction, on the contrary, gives our actions flexibility, and gives us the opportunity to fully enjoy the practicality of static typing.

Typification: the terms of success

What are the mechanisms for realistic static typing? All of them were introduced in the previous lectures, and therefore we can only briefly recall them. Listing them together shows the consistency and power of combining them.

Our type system is entirely based on the concept class... Even basic types such as INTEGER, and therefore, we do not need special rules for describing predefined types. (This is where our notation differs from "hybrid" languages ​​like Object Pascal, Java, and C ++, where the type system of older languages ​​is combined with class-based object technology.)

Expanded types give us more flexibility by allowing types whose values ​​denote objects as well as types whose values ​​denote references.

The decisive word in the creation of a flexible type system belongs to inheritance and related concept compatibility... This overcomes the main limitation of the classic typed languages, for example, Pascal and Ada, in which the operator x: = y requires that the type x and y was the same. This rule is too strict: it prohibits the use of entities that can denote objects of related types ( SAVINGS_ACCOUNT and CHECKING_ACCOUNT). In inheritance, we only require type compatibility y with type x, eg, x is of type ACCOUNT, y - SAVINGS_ACCOUNT, and the second class is the heir to the first.

In practice, a statically typed language needs support multiple inheritance... There are fundamental accusations of static typing that it does not provide the ability to interpret objects differently. So, the object DOCUMENT(document) can be transmitted over the network, and therefore needs components associated with the type MESSAGE(message). But this criticism is only true for languages ​​limited to single inheritance.

Fig. 17.2. Multiple inheritance

Versatility is necessary, for example, to describe flexible but safe container data structures (for example class LIST [G] ...). Without this mechanism, static typing would require declaring different classes for lists with different element types.

In some cases, versatility is required limit, which allows you to use operations that are applicable only to entities of a generic type. If the generic class SORTABLE_LIST supports sorting, it requires entities of the type G where G- a generic parameter, the presence of a comparison operation. This is achieved by linking with G the class defining the generic constraint - COMPARABLE:


class SORTABLE_LIST ...

Any actual generic parameter SORTABLE_LIST must be a descendant of the class COMPARABLE that has the required component.

Another mandatory mechanism is attempted assignment- organizes access to those objects, the type of which the software does not control. If a y is a database object or an object received over the network, then the operator x? = y will assign x value y, if a y is of a compatible type, or, if it is not, will give x value Void.

Assertions associated as part of the Design by Contract idea with classes and their components in the form of preconditions, postconditions, and class invariants, make it possible to describe semantic constraints that are not covered by a type specification. In languages ​​such as Pascal and Ada, there are range types that can limit the values ​​of an entity, for example, to the interval from 10 to 20, however, using them, you will not be able to achieve the value i was negative, always twice the j... Class invariants come to the rescue, designed to accurately reflect the imposed constraints, no matter how complex they are.

Pinned ads are needed in order to avoid avalanche code duplication in practice. Announcing y: like x, you get a guarantee that y will change following any repeated declarations like x the descendant. Without this mechanism, developers would be relentlessly busy with re-declarations, trying to keep the different types consistent.

Sticky declarations are a special case of the last language engine we need - covariance, which we will discuss in detail later.

When developing software systems, in fact, one more property is required, which is inherent in the development environment itself - fast incremental recompilation... When you write or modify a system, you want to see the effect of the change as soon as possible. With static typing, you must give the compiler time to type-check. Traditional compilation routines require re-compilation of the entire system (and its assemblies), and this process can be excruciatingly long, especially with the transition to large-scale systems. This phenomenon became an argument in favor of interpreting systems, such as early Lisp or Smalltalk environments, which ran the system with little or no processing, no type checking. This argument is now forgotten. A good modern compiler detects how the code has changed since the last compilation and only processes the changes it finds.

"Is the baby typed?"

Our goal - strict static typing. That is why we must avoid any loopholes in our "game by the rules", or at least accurately identify them if they exist.

The most common loophole in statically typed languages ​​is the presence of conversions that change the type of an entity. In C and its derivatives, they are called "casting" or casting (cast). Recording (OTHER_TYPE) x indicates that the value x is perceived by the compiler as having the type OTHER_TYPE, subject to some restrictions on possible types.

Such mechanisms bypass the limitations of type checking. Casting is widespread in C programming, including the ANSI C dialect. Even in C ++, casting, although not as frequent, remains common and perhaps necessary.

Sticking to static typing rules is not easy if they can be bypassed at any time by casting.

Typing and linking

Although as a reader of this book, you will surely distinguish between static and static typing. binding, there are people who cannot do this. This may be due in part to the influence of Smalltalk, which advocates a dynamic approach to both problems and can mislead people that they have the same solution. (We argue in our book that it is desirable to combine static typing and dynamic linking to create robust and flexible programs.)

Both typing and linking deal with the semantics of the Core Construct x.f (arg) but answer two different questions:

Typing and linking

[x]. The typing question: when we need to know for sure that at runtime there will be an operation corresponding to f applicable to an object attached to an entity x(with parameter arg)?

[x]. Linking question: when do we need to know what operation a given call initiates?

Typing answers the question of availability at least one operations, binding is responsible for the selection necessary.

Within the framework of the object approach:

[x]. the problem with typing is with polymorphism: insofar as xat runtime can denote objects of several different types, we must be sure that the operation representing f, available in each of these cases;

[x]. the binding problem is caused by repeated announcements: since a class can change inherited components, there may be two or more operations that claim to represent f in this call.

Both tasks can be solved both dynamically and statically. All four solutions are presented in the existing languages.

[x]. A number of non-object languages, such as Pascal and Ada, implement both static typing and static linking. Each entity represents objects of only one type, specified statically. This ensures the reliability of the solution, the price for which is its flexibility.

[x]. Smalltalk and other OO languages ​​contain dynamic linking and dynamic typing facilities. At the same time, the preference is given to flexibility at the expense of the reliability of the language.

[x]. Certain non-object languages ​​support dynamic typing and static linking. These include assembly languages ​​and a number of scripting languages.

[x]. The ideas of static typing and dynamic linking are embodied in the notation provided in this book.

Note the peculiarity of the C ++ language, which supports static typing, although not strict due to the presence of type casting, static binding (by default), dynamic binding when explicitly specifying virtual ( virtual) ads.

The reason for choosing static typing and dynamic linking is obvious. The first question is: "When will we know about the existence of the components?" - suggests a static response: " The earlier the better", which means: at compile time. The second question," Which component should I use? "suggests a dynamic response:" the one you need", - corresponding to the dynamic object type being determined at runtime. This is the only acceptable solution if static and dynamic linking produces different results.

The following example of an inheritance hierarchy will help clarify these concepts:

Fig. 17.3. Types of aircraft

Consider the call:


my_aircraft.lower_landing_gear

The typing question: when to make sure there will be a component lower_landing_gear("extend landing gear"), applicable to the object (for COPTER it will not be at all) The question of binding: which of several possible versions to choose.

Static linking would mean that we ignore the type of the object being attached and rely on the entity declaration. As a result, when dealing with the Boeing 747-400, we would call the version developed for the conventional 747 series airliners, and not for their modification 747-400. Dynamic linking applies the operation required by the object, and this is the correct approach.

With static typing, the compiler will not reject the call if it can be guaranteed that when executing the program to the entity my_aircraft the object supplied with the component corresponding to lower_landing_gear... The basic technique for obtaining guarantees is simple: with a mandatory declaration my_aircraft the base class of its type is required to include such a component. therefore my_aircraft cannot be declared as AIRCRAFT since the latter does not have lower_landing_gear at this level; helicopters, at least in our example, do not know how to release the landing gear. If we declare the entity as PLANE, - the class containing the required component - everything will be fine.

Smalltalk-style dynamic typing requires waiting for the call and checking for the presence of the required component at the time of its execution. This behavior is possible for prototypes and experimental designs, but unacceptable for industrial systems - at the time of flight it is too late to ask if you have a landing gear.

Covariance and Child Hiding

If the world were simple, then the conversation about typing could be finished. We have identified the goals and benefits of static typing, examined the constraints that realistic type systems must meet, and verified that the proposed typing methods meet our criteria.

But the world is not easy. Combining static typing with some of the requirements of software engineering creates more complex problems than meets the eye. There are two mechanisms that cause problems: covariance- changing the types of parameters when overriding, descendant hiding- the ability of a descendant class to restrict the export status of inherited components.

Covariance

What happens to the arguments of a component when overriding its type? This is a major problem, and we have already seen a number of examples of its manifestation: devices and printers, one- and two-linked lists, etc. (see Sections 16.6, 16.7).

Here is another example to help clarify the nature of the problem. And even if it is far from reality and metaphorical, its closeness to program schemes is obvious. In addition, analyzing it, we will often return to problems from practice.

Imagine a university ski team preparing for the championship. Class GIRL includes female skiers, BOY- skiers. A number of participants from both teams are ranked, having shown good results in previous competitions. This is important for them, because now they will run first, gaining an advantage over the rest. (This rule, which gives privileges to the already privileged, is perhaps what makes slalom and cross-country skiing so attractive in the eyes of many people, being a good metaphor for life itself.) So we have two new classes: RANKED_GIRL and RANKED_BOY.

Fig. 17.4. Skier classification

A number of rooms have been booked for the accommodation of athletes: only for men, only for girls, only for prize-winners. To display this, we use a parallel class hierarchy: ROOM, GIRL_ROOM and RANKED_GIRL_ROOM.

Here is a sketch of the class SKIER:


- Neighbor by number.
... Other possible components omitted in this and subsequent classes ...

We are interested in two components: the attribute roommate and procedure share, "placing" this skier in the same room as the current skier:


When declaring an entity other you can refuse the type SKIER in favor of the fixed type like roommate(or like Current for roommate and other simultaneously). But let's forget for a moment about pinning types (we'll come back to them later) and look at the problem of covariance in its original form.

How to introduce type overrides? The rules require separate accommodation for boys and girls, prize-winners and other participants. To solve this problem, when overriding, we will change the type of the component roommate as shown below (hereinafter, overridden elements are underlined).


- Neighbor by number.

Let's redefine, respectively, the procedure argument share... A more complete version of the class now looks like this:


- Neighbor by number.
- Select other as a neighbor.

Similarly, all generated from SKIER classes (we do not use type fixing now). As a result, we have a hierarchy:

Fig. 17.5. Member hierarchy and redefinitions

Since inheritance is a specialization, the type rules require that when overriding the result of a component, in this case roommate, the new type was a descendant of the original. The same goes for overriding the argument type other subroutines share... This strategy, as we know, is called covariance, where the prefix "ko" indicates a joint change of the parameter and result types. The opposite strategy is called contravariance.

All our examples are convincing evidence of the practical need for covariance.

[x]. Singly linked list item LINKABLE must be associated with another element similar to itself, and the instance BI_LINKABLE- with someone like yourself. Covariantly will need to be overridden and the argument in put_right.

[x]. Any subroutine in the composition LINKED_LIST with an argument like LINKABLE when going to TWO_WAY_LIST will require an argument BI_LINKABLE.

[x]. Procedure set_alternate takes DEVICE-argument in class DEVICE and PRINTER-argument - in the class PRINTER.

Covariant overrides are especially popular because hiding information leads to the creation of procedures of the form


- Set attrib to v.

to work with attrib type SOME_TYPE... Such procedures are naturally covariant, since any class that changes the type of an attribute must override the argument accordingly. set_attrib... Although the examples presented fit into one scheme, covariance is much more widespread. Think, for example, of a procedure or function that concatenates singly linked lists ( LINKED_LIST). Its argument must be redefined as a doubly linked list ( TWO_ WAY_LIST). Universal addition operation infix "+" takes NUMERIC-argument in class NUMERIC, REAL- in class REAL and INTEGER- in class INTEGER... In parallel hierarchies of telephone service to a procedure start in class PHONE_SERVICE an argument may be required ADDRESS representing the subscriber's address (for billing), while the same procedure in the class CORPORATE_SERVICE need an argument like CORPORATE_ADDRESS.

Fig. 17.6. Communication services

What about a contravariant solution? In the example with skiers, it would mean that if, going to class RANKED_GIRL, result type roommate redefined as RANKED_GIRL, then, due to contravariance, the type of the argument share can be overridden to type GIRL or SKIER... The only type that is not valid in a contravariant solution is RANKED_GIRL! Enough to arouse the worst suspicions of the girls' parents.

Parallel hierarchies

In order not to leave a stone unturned, consider a variant of the example SKIER with two parallel hierarchies. This will allow us to simulate a situation that has already been encountered in practice: TWO_ WAY_LIST> LINKED_LIST and BI_LINKABLE> LINKABLE; or hierarchy with telephone service PHONE_SERVICE.

Let's have a hierarchy with a class ROOM whose descendant is GIRL_ROOM(class BOY omitted):

Fig. 17.7. Skiers and rooms

Our skier classes are in this parallel hierarchy instead of roommate and share will have similar components accommodation (accommodation) and accommodate (place):


description: "New variant with parallel hierarchies"
accommodate (r: ROOM) is ... require ... do

Covariant overrides are also needed here: in the class GIRL1 as accommodation and the argument of the subroutine accommodate should be replaced with type GIRL_ROOM, in class BOY1- type BOY_ROOM etc. (Remember, we're still working without type pinning.) As in the previous example, contravariance is useless here.

The willfulness of polymorphism

Aren't there enough examples confirming the practicality of covariance? Why would anyone consider contravariance, which conflicts with what is needed in practice (if not to take into account the behavior of some young people)? To understand this, consider the problems that arise when combining polymorphism and the covariance strategy. It's easy to come up with a sabotage scheme, and you may have already created one yourself:


create b; create g; - Creation of BOY and GIRL objects.

The result of the last call, quite possibly pleasing to young men, is exactly what we tried to avoid with type overrides. Call share leads to the fact that the object BOY, known as b and thanks to polymorphism, the received alias s type SKIER, becomes a neighbor of the object GIRL known as g... However, the call, although it contradicts the rules of the hostel, is quite correct in the program text, since share-exported component as part of SKIER, but GIRL, argument type g, compatible with SKIER, the type of the formal parameter share.

The parallel hierarchy scheme is just as simple: replace SKIER on the SKIER1, call share- to call s.accommodate (gr) where gr- entity type GIRL_ROOM... The result is the same.

With a contravariant solution to these problems, the following would not arise: specialization of the target of the call (in our example s) would require a generalization of the argument. As a result, contravariance leads to a simpler mathematical model of the mechanism: inheritance - override - polymorphism. This fact is described in a number of theoretical articles that suggest this strategy. The argumentation is not very convincing, because, as our examples and other publications show, contravariance has no practical use.

Therefore, without trying to pull contravariant clothing over a covariant body, one should accept the covariant reality and look for ways to eliminate the undesirable effect.

Hiding by a child

Before looking for a solution to the problem of covariance, let us consider another mechanism that can lead to type violations under polymorphism. Descendant hiding is the ability of a class not to export a parent component.

Fig. 17.8. Hiding by a child

A typical example is the component add_vertex(add vertex) exported by class POLYGON but hidden by its descendant RECTANGLE(due to a possible violation of the invariant - the class wants to remain a rectangle):


Non-programmer example: the "Ostrich" class hides the "Fly" method inherited from the parent "Bird".

Let's take this scheme as it is for a moment and ask the question of whether a combination of inheritance and hiding would be legitimate. The modeling role of hiding, like covariance, is disrupted by the tricks that polymorphism can cause. And here it is not difficult to build a malicious example that allows, despite hiding the component, call it and add a vertex to the rectangle:


create r; - Creation of the RECTANGLE object.
p: = r; - Polymorphic assignment.

Since the object r hiding under the essence p class POLYGON, but add_vertex exported component POLYGON, then its challenge by the entity p correct. As a result of execution, another vertex will appear in the rectangle, which means that an invalid object will be created.

Correctness of systems and classes

We need a few new terms to discuss the problems of covariance and child hiding. We will call class-valid a system that satisfies the three rules for describing types given at the beginning of the lecture. Let's remind them: each entity has its own type; the type of the actual argument must be compatible with the type of the formal one, the situation is similar with the assignment; the called component must be declared in its class and exported to the class containing the call.

The system is called system-valid if there is no type violation when it is executed.

Ideally, both concepts should be the same. However, we have already seen that a class-correct system under conditions of inheritance, covariance, and hiding by a descendant may not be system-correct. Let's call this error system validity error.

Practical aspect

The simplicity of the problem creates a kind of paradox: an inquisitive beginner will build a counterexample in a matter of minutes; in real practice, class correctness errors of systems occur every day, but violations of system correctness even in large, multi-year projects are extremely rare.

However, this does not allow us to ignore them, and therefore we begin to study three possible ways to solve this problem.

Next, we will touch on very subtle and not so often making themselves felt aspects of the object approach. When reading the book for the first time, you can skip the remaining sections of this lecture. If you have just recently taken up OO technology, then you should better master this material after studying lectures 1-11 of the course "Fundamentals of Object-Oriented Design" on the methodology of inheritance, and especially lecture 6 of the course "Fundamentals of Object-Oriented Design" on the methodology inheritance.

Correctness of systems: first approximation

Let's concentrate first on the covariance problem, the more important of the two. There is an extensive literature devoted to this topic, offering a range of different solutions.

Contravariance and nonvariance

Contravariance eliminates the theoretical problems associated with violation of system correctness. However, this loses the realism of the type system; for this reason, there is no need to consider this approach in the future.

The originality of the C ++ language is that it uses the strategy novariance without allowing you to change the type of arguments in overridden subroutines! If C ++ were a strongly typed language, its system types would be difficult to use. The simplest solution to the problem in this language, as well as bypassing other limitations of C ++ (say, the lack of limited universality), is to use casting - type casting, which allows you to completely ignore the existing typing mechanism. This solution does not seem attractive. Note, however, that a number of proposals discussed below will rely on variance, the meaning of which will be given by the introduction of new mechanisms for working with types instead of covariant overrides.

Using generic parameters

Versatility is at the heart of an interesting idea pioneered by Franz Weber. Let's declare a class SKIER1 by restricting the generic parameter universalization to the class ROOM:


class SKIER1 feature
accommodate (r: G) is ... require ... do accommodation: = r end

Then the class GIRL1 will heir SKIER1 and so on. The same technique, no matter how strange it may seem at first glance, can be used in the absence of a parallel hierarchy: class SKIER.

This approach solves the problem of covariance. Any time you use the class, you must set the actual generic parameter ROOM or GIRL_ROOM, so the wrong combination just becomes impossible. The language becomes nonvariant, and the system fully meets the needs of covariance thanks to generic parameters.

Unfortunately, this technique is unacceptable as a general solution, as it leads to a proliferation of generic parameters, one for each type of possible covariant argument. Worse, adding a covariant subroutine with an argument whose type is not in the list will require adding a generic class parameter, and, therefore, will change the interface of the class, and will cause changes for all clients of the class, which is unacceptable.

Typical variables

Several authors, including Kim Bruce, David Shang, and Tony Simons, have proposed a solution based on type variables, whose values ​​are types. Their idea is simple:

[x]. instead of covariant overrides, allow type declarations using type variables;

[x]. extend the type compatibility rules to control such variables;

[x]. provide the ability to assign language types as values ​​to type variables.

Readers can find a detailed presentation of these ideas in a number of articles on this topic, as well as in publications by Cardelli, Castagna, Weber and others. You can start studying the issue from the sources indicated in the bibliographic notes for this lecture. We will not deal with this problem, and here's why.

[x]. A properly implemented type variable mechanism falls into the category of allowing a type to be used without its full specification. This same category includes versatility and ad pinning. This mechanism could replace other mechanisms in this category. At first, this can be interpreted in favor of type variables, but the result can be disastrous, since it is not clear whether this overarching mechanism can cope with all tasks with the ease and simplicity that is inherent in universality and fixation of types.

[x]. Suppose you have developed a generic variable mechanism that can overcome the problems of combining covariance and polymorphism (while still ignoring the problem of child hiding). Then the class developer will be required to extraordinary intuition in order to decide in advance which of the components will be available for overriding types in the derived classes, and which will not. Below we will discuss this problem that occurs in the practice of creating programs and, alas, which casts doubt on the applicability of many theoretical schemes.

This forces us to return to the mechanisms already discussed: bounded and unbounded universality, type fixation, and, of course, inheritance.

Relying on type pinning

We will find an almost ready-made solution to the problem of covariance by looking closely at the mechanism of pinned declarations we know.

When describing classes SKIER and SKIER1 you could not help but visit the desire, using the pinned declarations, get rid of many overrides. Anchoring is a typical covariant mechanism. This is what our example will look like (all changes are underlined):


share (other: like Current) is ... require ... do
accommodate (r: like accommodation) is ... require ... do

Now descendants can leave the class SKIER unchanged, and in SKIER1 they only need to override the attribute accommodation... Sticky Entities: Attribute roommate and arguments to subroutines share and accommodate- will change automatically. This greatly simplifies the work and confirms the fact that in the absence of binding (or other similar mechanism, for example, type variables), it is impossible to write an OO software product with realistic typing.

But did you manage to fix the system correctness violations? Not! We can, as before, outsmart the type checking by performing polymorphic assignments that violate system correctness.

True, the original versions of the examples will be rejected. Let be:


create b; create g; - Creation of BOY and GIRL objects.
s: = b; - Polymorphic assignment.

Argument g transmitted share, is now incorrect as it requires an object of type like s and the class GIRL is incompatible with this type, since by the rule of pinned types, no type is compatible with like s except for himself.

However, we will not be happy for long. In the other direction, this rule says that like s compatible with type s... So, using polymorphism not only of an object s, but also the parameter g, we can bypass the type checking system again:


s: SKIER; b: BOY; g: like s; actual_g: GIRL;
create b; create actual_g - Creates BOY and GIRL objects.
s: = actual_g; g: = s - Use s to add g to the GIRL.
s: = b - Polymorphic assignment.

As a result, the illegal call passes.

There is a way out. If we are seriously ready to use pinning of declarations as the only mechanism of covariance, then we can get rid of violations of system correctness by completely disabling the polymorphism of pinned entities. This will require a change in the language: we will introduce a new keyword anchor(we need this hypothetical construction solely in order to use it in this discussion):


Allow type declarations like s only when s described as anchor... Let's change the compatibility rules to ensure: s and elements like like s can only be attached (in assignments or passing an argument) to each other.

With this approach, we remove from the language the ability to override the type of any arguments to a subroutine. In addition, we could prohibit overriding the result type, but this is not necessary. The ability to redefine the attribute type is of course preserved. Everything argument type overrides will now be performed implicitly through the covariance-triggered posting mechanism. Where, with the previous approach, the class D overridden the inherited component as:


whereas the class C- parent D it looked


Where Y corresponded X then now overriding the component r will look like this:


Remains only in class D override type your_anchor.

This solution to the problem of covariance - polymorphism will be called the approach Anchoring... It would be more accurate to say: "Covariance only through Binding". The properties of the approach are attractive:

[x]. Anchoring is based on the idea of ​​strict separation covariant and potentially polymorphic (or, for short, polymorphic) elements. All entities declared as anchor or like some_anchor are covariant; others are polymorphic. In each of the two categories, any joins are allowed, but there is no entity or expression that violates the boundary. You cannot, for example, assign a polymorphic source to a covariant target.

[x]. This simple and elegant solution is easy to explain, even for beginners.

[x]. It completely eliminates the possibility of system correctness violation in covariantly constructed systems.

[x]. It retains the conceptual framework laid down above, including the concept of limited and unlimited universality. (As a result, this solution, in my opinion, is preferable to typical variables that replace the mechanisms of covariance and universality, designed to solve various practical problems.)

[x]. It requires a minor language change — adding one keyword reflected in the match rule — and does not involve perceived implementation difficulties.

[x]. It is realistic (at least in theory): any previously possible system can be rewritten by replacing covariant overrides with pinned redeclarations. True, some of the joins will become invalid as a result, but they correspond to cases that can lead to type violations, and therefore they should be replaced with assignment attempts and the situation at run time should be sorted out.

It would seem that the discussion can be ended at this point. So why isn't the Anchoring approach completely satisfying? First of all, we have not yet touched on the problem of child hiding. In addition, the main reason for the continuation of the discussion is the problem already expressed when briefly mentioning type variables. The division of spheres of influence on the polymorphic and covariant part is somewhat similar to the result of the Yalta conference. He assumes that the developer of the class has an extraordinary intuition that he is able to, for each entity he introduces, in particular, for each argument, once and for all, choose one of two possibilities:

[x]. An entity is potentially polymorphic: now or later (by passing parameters or by assignment) it can be attached to an object whose type is different from the declared one. The original entity type cannot be changed by any descendant of the class.

[x]. An entity is subject to type overrides, that is, it is either pinned or itself a pivot.

But how can a developer anticipate all this? All the attractiveness of the OO method, expressed in many respects in the Open-Closed principle, is precisely connected with the possibility of changes that we have the right to make to the previously done work, as well as with the fact that the developer of universal solutions not must have infinite wisdom, understanding how his product can be adapted to their needs by descendants.

With this approach, type overriding and child hiding is a kind of "safety valve" that makes it possible to reuse an existing class, almost suitable for our purposes:

[x]. By resorting to type overrides, we can change the declarations in the derived class without affecting the original. In this case, a purely covariant solution will require correcting the original using the described transformations.

[x]. Hiding by a child protects against many failures when creating a class. You can criticize a project in which RECTANGLE, using the fact that he is a descendant POLYGON, tries to add a vertex. Instead, one could propose an inheritance structure in which shapes with a fixed number of vertices are separated from all others, and the problem would not arise. However, when designing inheritance structures, it is always preferable to have those that do not taxonomic exemptions... But can they be completely eliminated? As we discuss export restrictions in one of the next lectures, we will see that this is not possible for two reasons. The first is the presence of competing classification criteria. Secondly, the likelihood that the developer will not find the perfect solution, even if one exists.

Global analysis

This section is devoted to describing the intermediate approach. The main practical solutions are outlined in Lecture 17.

While exploring the pinning option, we noticed that its main idea was to separate the covariant and polymorphic entity sets. So, if we take two instructions of the form


each serves as an example of the correct application of important OO mechanisms: the first is polymorphism, the second is type overrides. Problems start when you combine them for the same entity s... Similarly:


the problem begins with the unification of two independent and completely innocent operators.

Erroneous calls lead to type violation. In the first example, polymorphic assignment attaches an object BOY to the essence s, what is he doing g invalid argument share since it is associated with the object GIRL... In the second example, to the entity r attaches object RECTANGLE which excludes add_vertex from among the exported components.

Here is the idea for a new solution: in advance - statically, when checking types by the compiler or other tools - we define typeset of each entity, including the types of objects with which the entity can be associated at runtime. Then, again statically, we make sure that each call is correct for each item from the set of target types and arguments.

In our examples, the operator s: = b indicates that the class BOY belongs to the set of types for s(because as a result of the creation statement create b it belongs to the typeset for b). GIRL, due to the presence of instructions create g, belongs to the set of types for g... But then the call share will be invalid for the purpose s type BOY and argument g type GIRL... Likewise RECTANGLE is in the typeset for p, which is due to polymorphic assignment, however, the call add_vertex for p type RECTANGLE turns out to be invalid.

These observations lead us to the idea of ​​creating global approach based on the new typing rule:

System correctness rule

Call x.f (arg) is system-correct if and only if it is class-correct for x, and arg of any type from their respective set of types.

In this definition, a call is considered class-correct if it does not violate the Component Calling rule, which states: if C there is a base class like x, component f should be exported C and the type arg must be compatible with the type of the formal parameter f... (Remember: for simplicity, we assume that each subroutine has only one parameter, however, it is not difficult to extend the rule to an arbitrary number of arguments.)

System correctness of a call is reduced to class correctness, with the exception that it is checked not for individual elements, but for any pairs from sets of sets. Here are the basic rules for creating a set of types for each entity:

1 For each entity, the initial set of types is empty.

2 Having met the next instruction of the form create (SOME_TYPE) a, add SOME_TYPE into a set of types for a... (For simplicity, we will assume that any instruction create a will be replaced by an instruction create (ATYPE) a where ATYPE- entity type a.)

3 Having met the next assignment of the form a: = b, add to the set of types for a b.

4 If a a there is a formal parameter of a subroutine, then, upon meeting another call with an actual parameter b, add to the set of types for a all elements of the set of types for b.

5 We will repeat steps (3) and (4) until the sets of types stop changing.

This formulation does not take into account the mechanism of universality, but it is possible to extend the rule as needed without any problems. Step (5) is necessary due to the possibility of assignment and transfer chains (from b to a, from c to b etc.). It is easy to understand that after a finite number of steps this process will stop.

As you may have noticed, the rule ignores the sequence of instructions. When


create (TYPE1) t; s: = t; create (TYPE2) t

into a set of types for s will enter as TYPE1 and TYPE2, although s given the sequence of instructions, it can only accept values ​​of the first type. Taking into account the location of instructions will require the compiler to analyze the flow of instructions in depth, which will lead to an excessive increase in the complexity of the algorithm. Instead, more pessimistic rules apply: the sequence of operations:


will be declared systemically incorrect, even though the sequence of their execution does not result in a type violation.

The global analysis of the system was presented (in more detail) in the 22nd chapter of the monograph. This solved both the problem of covariance and the problem of export restrictions during inheritance. However, this approach has an annoying practical flaw, namely: it is supposed to check system as a whole, and not each class separately. Rule (4) turns out to be deadly, which, when calling a library routine, will take into account all its possible calls in other classes.

Although then algorithms for working with individual classes were proposed, their practical value could not be established. This meant that in a programming environment that supported incremental compilation, the entire system would need to be checked. It is desirable to introduce validation as an element of (quick) local processing of changes made by the user to some classes. Although examples of the application of the global approach are known, for example, programmers in the C language use the tool lint for finding inconsistencies in the system that are not detected by the compiler - all this does not look very attractive.

As a result, as far as I know, the system correctness check remained unimplemented by anyone. (Another reason for this outcome may have been the complexity of the validation rules themselves.)

Class correctness assumes class-bound checking and is therefore possible with incremental compilation. System correctness implies a global check of the entire system, which conflicts with incremental compilation.

However, despite its name, it is actually possible to check system correctness using only incremental class checking (while running a normal compiler). This will be the final contribution to solving the problem.

Beware of polymorphic catcalls!

The System Correctness Rule is pessimistic: for the sake of simplicity, it also rejects completely safe combinations of instructions. Paradoxically, we will build the last solution based on even more pessimistic rule... Naturally, this will raise the question of how realistic our result will be.

Back to Yalta

Solution essence Catcall, - we will explain the meaning of this concept later, - in a return to the spirit of the Yalta agreements, dividing the world into polymorphic and covariant (and the satellite of covariance is the hiding of descendants), but without the need to possess infinite wisdom.

As before, we will narrow the question of covariance to two operations. In our main example, this is a polymorphic assignment: s: = b, and calling the covariant subroutine: s.share (g)... Analyzing who is the real culprit of violations, we exclude the argument g from among the suspects. Any argument of type SKIER or generated from it, it does not suit us due to polymorphism s and covariance share... Therefore, if you statically describe the entity other as SKIER and dynamically attach to the object SKIER then call s.share (other) statically will give the impression of being ideal, but will result in type violation if polymorphically assigned s value b.

The fundamental problem is that we are trying to use s in two incompatible ways: as a polymorphic entity and as the target of a covariant subroutine call. (In our other example, the problem is using p as a polymorphic entity and as the target of calling a subroutine of the child hiding the component add_vertex.)

Catcall's solution, like Binding, is radical in nature: it prohibits the use of an entity as polymorphic and covariant at the same time. Like global analysis, it statically determines which entities can be polymorphic, however, it doesn't try to be too smart about looking for sets of possible types for entities. Instead, any polymorphic entity is perceived as sufficiently suspicious, and it is forbidden to enter into an alliance with a circle of respectable persons, including covariance and concealment by the offspring.

One rule and several definitions

The type rule for Catcall's solution is simple:

Catcall's type rule

Polymorphic catcalls are incorrect.

It is based on just as simple definitions. First of all, a polymorphic entity:

Definition: polymorphic entity

The essence x a reference (not expanded) type is polymorphic if it has one of the following properties:

1 Occurs in assignment x: = y where is the essence y is of a different type or is polymorphic by recursion.

2 Found in creation instructions create (OTHER_TYPE) x where OTHER_TYPE is not the type specified in the declaration x.

3 Is a formal argument to a subroutine.

4 It is an external function.

The purpose of this definition is to give the status of polymorphic ("potentially polymorphic") any entity that can be attached to objects of different types during program execution. This definition only applies to reference types, since expanded entities cannot be polymorphic by nature.

In our examples, the skier s and polygon p- are polymorphic according to the rule (1). The first one is assigned an object BOY b, the second - the object RECTANGLE r.

If you are familiar with the formulation of the notion of a set of types, you have noticed how much more pessimistic the definition of a polymorphic entity looks, and how much easier it is to test it. Without trying to find all kinds of dynamic entity types, we are content with the general question: can a given entity be polymorphic or not? The most surprising is rule (3), according to which polymorphic considered each formal parameter(unless its type is expanded, as is the case with integers, etc.). We don't even bother analyzing the calls. If the subroutine has an argument, then it is at the complete disposal of the client, which means that you cannot rely on the type specified in the declaration. This rule is closely related to reuse — the goal of object technology — where any class can potentially be included in a library, and will be called multiple times by different clients.

The characteristic property of this rule is that it does not require any global checks. To identify the polymorphism of an entity, it is enough to look at the text of the class itself. If we save information about their polymorphism status for all requests (attributes or functions), then we do not even have to study the texts of ancestors. Unlike looking for sets of types, you can discover polymorphic entities by checking class by class in incremental compilation.

Calls, like entities, can be polymorphic:

Definition: polymorphic call

A call is polymorphic if its target is polymorphic.

Both calls in our examples are polymorphic: s.share (g) due to polymorphism s, p.add_ vertex (...) due to polymorphism p... By definition, only qualified calls can be polymorphic. (By giving an unqualified call f (...) kind of qualified Current.f (...), we do not change the essence of the matter, since Current that cannot be assigned anything is not a polymorphic object.)

Next, we need the Catcall concept based on the CAT concept. (CAT stands for Changing Availability or Type). A subroutine is a CAT subroutine if some child redefinition of it results in one of two kinds of changes that, as we have seen, are potentially dangerous: changing the argument type (covariantly) or hiding a previously exported component.

Definition: CAT routines

A subroutine is called a CAT subroutine if some redefinition of it changes the export status or the type of any of its arguments.

This property again allows incremental validation: any override of the argument type or export status makes the procedure or function a CAT subroutine. This is where the Catcall concept follows: a CAT subroutine call that can be erroneous.

Definition: Catcall

The call is called Catcall if some redefinition of the subroutine would make it erroneous due to a change in the export status or the type of the argument.

The classification we have created allows us to distinguish special groups of calls: polymorphic and catcalls. Polymorphic calls give expressive power to the object approach, catcalls allow you to override types and restrict exports. Using the terminology introduced earlier in this lecture, we can say that polymorphic calls extend usefulness, catcolls - usability.

Challenges share and add_vertex in our examples are cat calls. The first one performs a covariant redefinition of its argument. The second is exported by the class RECTANGLE but hidden by the class POLYGON... Both calls are also polymorphic, which makes them perfect examples of polymorphic catcalls. They are erroneous according to the Catcall rule of types.

Evaluation

Before we summarize what we’ve learned about covariance and child hiding, let us recall again that system correctness violations are rare indeed. The most important properties of static OO typing were summarized at the beginning of the lecture. This impressive set of mechanisms for working with types, coupled with class correctness checking, paves the way for a safe and flexible way of constructing software.

We saw three solutions to the problem of covariance, two of which also touched on export restrictions. Which one is correct?

There is no definitive answer to this question. The implications of the insidious interaction of OO typing and polymorphism have not been as well studied as the issues discussed in previous lectures. In recent years, numerous publications have appeared on this topic, links to which are given in the bibliography at the end of the lecture. In addition, I hope that in this lecture I was able to present the elements of the final solution, or at least come close to it.

Global analysis seems impractical due to the complete check of the entire system. However, he helped us better understand the problem.

The Binding solution is extremely attractive. It is simple, intuitive and easy to implement. All the more we must regret the impossibility of supporting a number of key requirements of the OO method, as reflected in the Open-Closed principle. If we really had great intuition, then pinning would be a great solution, but which developer would dare to assert this, or, even more so, admit that the authors of the library classes inherited in his project had such intuition?

If we are forced to abandon the fixation, then the Catcall solution seems to be the most suitable, it is quite easily explainable and applicable in practice. His pessimism should not rule out useful combinations of operators. In the case where a polymorphic catcall is generated by a "legitimate" operator, you can always safely allow it by introducing an assignment attempt. Thus, a number of checks can be carried over to the runtime of the program. However, the number of such cases should be extremely small.

By way of clarification, I should note that at the time of this writing, Catcall's solution has not been implemented. Until the compiler is adapted to Catcall's type checking and successfully applied to representational systems - large and small - it is too early to say that the last word has been said in the issue of reconciling static typing with polymorphism combined with covariance and child hiding. ...

Full compliance

To conclude our discussion of covariance, it is helpful to understand how the general method can be applied to a fairly general problem. The method appeared as a result of Catcall theory, but it can be used within the framework of the basic version of the language without introducing new rules.

Suppose there are two matched lists, where the first is the skiers and the second is the skier's roommate from the first list. We want to carry out the appropriate placement procedure share, only if it is allowed by the rules for describing types that allow girls with girls, girls-winners with girls-winners, and so on. Problems of this kind are common.

Possibly a simple solution based on previous discussion and attempted assignment. Consider the generic function fitted(to approve):


fitted (other: GENERAL): like other is
- The current object (Current), if its type corresponds to the type of the object,
- attached to other, otherwise void.
if other / = Void and then conforms_to (other) then

Function fitted returns the current object, but known as the entity of the type attached to the argument. If the type of the current object does not match the type of the object attached to the argument, then it returns Void... Note the role of the assignment attempt. The function uses the component conforms_to from the class GENERAL, which finds out the compatibility of the types of a pair of objects.

Replacement conforms_to to another component GENERAL With name same_type gives us a function perfect_fitted (full compliance) which returns Void if the types of both objects are not identical.

Function fitted- gives us a simple solution to the problem of matching skiers without breaking the rules for describing types. So, in the class code SKIER we can introduce a new procedure and use it instead share, (the latter can be done with a hidden procedure).


- Select, if applicable, other as a neighbor by number.
- gender_ascertained - established gender
gender_ascertained_other: like Current
gender_ascertained_other: = other .fitted (Current)
if gender_ascertained_other / = Void then
share (gender_ascertained_other)
"Conclusion: collocation with other is not possible"

For other arbitrary type SKIER(not only like Current) define the version gender_ascertained_other of the type assigned to Current... The function will help us to guarantee the identity of types. perfect_ fitted.

If there are two parallel lists of skiers representing the planned accommodation:


occupant1, occupant2: LIST

you can organize a loop by calling at each step:


occupant1.item.safe_share (occupant2.item)

matching list items if and only if their types are fully compatible.

Key concepts

[x]. Static typing is the key to reliability, readability, and efficiency.

[x]. To be realistic, static typing requires a combination of mechanisms: assertions, multiple inheritance, attempted assignment, bounded versus unbounded versatility, pinned declarations. The type system must not allow for traps (type casts).

[x]. A rule of thumb for redeclaration is to allow for covariant redefinition. Result and argument types when overridden must be compatible with the original.

[x]. Covariance, as well as the ability of a child to hide a component exported by an ancestor, combined with polymorphism, creates a rare but serious type violation problem.

[x]. These violations can be avoided by using: global analysis (which is impractical), limiting covariance to pinned types (which contradicts the "Open-Closed" principle), Catcall's solution, which prevents a polymorphic target from calling a subroutine with covariance or hiding by a child.

Bibliographic notes

A number of materials from this lecture are presented in reports on the forums OOPSLA 95 and TOOLS PACIFIC 95, and also published in. A number of review materials are borrowed from the article.

The notion of automatic type deduction was introduced in, where the type deduction algorithm of the functional language ML is described. The relationship between polymorphism and type checking has been explored in the paper.

Techniques for improving the code efficiency of dynamically typed languages ​​in the context of the Self language can be found in.

Luca Cardelli and Peter Wegner wrote a theoretical article on types in programming languages ​​that had a great influence on specialists. This work, built on the basis of the lambda calculus (see), served as the basis for many further studies. It was preceded by another fundamental paper by Cardelli.

The ISE Guide includes an introduction to the problems of using polymorphism, covariance, and child hiding together. The lack of proper analysis in the first edition of this book triggered a number of critical discussions (the first of which was the commentary by Philippe Elinck in his bachelor's work "De la Conception-Programmation par Objets", Memoire de license, Universite Libre de Bruxelles (Belgium), 1988), expressed in the works of I. Cook's article provides several examples related to the covariance problem and attempts to solve it. A solution based on typical parameters for covariant entities on TOOLS EUROPE 1992 was proposed by Franz Weber. The exact definitions of the concepts of system correctness, as well as class correctness, are given in, where a solution is proposed using a complete analysis of the system. Catcall's solution was first proposed in; see also .

The Fixing solution was presented in my presentation at the TOOLS EUROPE 1994 workshop. At that time, however, I did not see the need for anchor- declarations and related compatibility restrictions. Paul Dubois and Amiram Yehudai were quick to point out that under these conditions the problem of covariance remains. They, along with Reinhardt Budde, Karl-Heinz Sylla, Kim Walden and James McKim, made many critical points in the work that led to to writing this lecture.

A large body of literature has been devoted to covariance issues. In and you will find both an extensive bibliography and an overview of the mathematical aspects of the problem. For a list of links to online OOP type theory materials and their authors' Web pages, see Laurent Dami's page. The concepts of covariance and contravariance are borrowed from category theory. We owe their appearance in the context of program typing to Luca Cardelli, who began using them in his speeches from the beginning of the 80s, but did not use them in print until the end of the 80s.

Tricks based on generic variables are described in,,.

Contravariance was implemented in the Sather language. Explanations are given in.

While intermediate options are possible, two main approaches are presented here:

  • Dynamic typing: wait for the moment of completion of each call and then make a decision.
  • Static typing: Based on a set of rules, determine from the source text if type violations are possible during execution. The system is executed if the rules guarantee that there are no errors.

These terms are easy to explain: when dynamic typing type checking occurs while the system is running (dynamically), and when static typing the check is performed on the text statically (before execution).

Static typing assumes an automatic check, usually assigned to the compiler. As a result, we have a simple definition:

Definition: statically typed language

An OO language is statically typed if it comes with a set of consistent compiler-checked rules to ensure that the execution of the system does not lead to type violation.

In the literature, the term " strong typing "( strong). It conforms to the ultimatum nature of the definition, which requires no type violation at all. Possible and weak (weak) forms static typing in which the rules eliminate certain violations without eliminating them entirely. In this sense, some OO languages ​​are statically weakly typed. We will fight for the strongest typing.

In dynamically typed languages known as untyped, there are no type declarations, and any value can be attached to entities at runtime. Static type checking is not possible in them.

Typing rules

Our OO notation is statically typed. Its type rules were introduced in previous lectures and boil down to three simple requirements.

  • When declaring each entity or function, its type must be specified, for example, acc: ACCOUNT... Each subroutine has 0 or more formal arguments, the type of which must be specified, for example: put (x: G; i: INTEGER).
  • In any assignment x: = y, and in any subroutine call in which y is the actual argument for the formal argument x, the source type y must be compatible with the target type x. The definition of compatibility is based on inheritance: B is compatible with A if it is a descendant of it, augmented by rules for generic parameters (see "Introduction to Inheritance").
  • The call to x.f (arg) requires f to be a base class component for the target type x, and f must be exported to the class in which the call appears (see 14.3).

Realism

Although the definition of a statically typed language is quite accurate, it is not enough - informal criteria are needed when creating typing rules. Consider two extreme cases.

  • Perfectly correct language, in which every syntactically correct system is correct with respect to types. Type declaration rules are not needed. Such languages ​​exist (imagine the Polish notation for an expression with addition and subtraction of integers). Unfortunately, no real universal language meets this criterion.
  • Completely incorrect language which is easy to create by taking any existing language and adding a typing rule that makes any the system is incorrect. By definition, this language is typed: since there are no systems that match the rules, no system will cause a type violation.

We can say that languages ​​of the first type fit but useless, the latter may be useful, but not useful.

In practice, we need a type system that is both usable and useful: powerful enough to meet computational needs and convenient enough not to force us to complicate things to satisfy typing rules.

Let's say that the language realistic if it is suitable for use and useful in practice. In contrast to the definition static typing giving an categorical answer to the question: " Is X statically typed? ", the definition of realism is partly subjective.

In this lecture, we will make sure that the notation we are proposing is realistic.

Pessimism

Static typing leads by nature to "pessimistic" politics. An attempt to guarantee that all calculations do not lead to failures, rejects calculations that could have ended without error.

Consider a regular, non-object, Pascal-like language with various REAL and INTEGER types. When describing n: INTEGER; r: Real operator n: = r will be rejected as in violation of the rules. Thus, the compiler will reject all of the following statements:

n: = 0.0 [A] n: = 1.0 [B] n: = -3.67 [C] n: = 3.67 - 3.67 [D]

If we enable their execution, we will see that [A] will always work, since any number system has an exact representation of the real number 0,0, which can be unambiguously translated into 0 integers. [B] will almost certainly work as well. The result of action [C] is not obvious (do we want to get the total by rounding or discarding the fractional part?). [D] will do its job, just like the operator:

if n ^ 2< 0 then n:= 3.67 end [E]

where the unreachable assignment goes (n ^ 2 is the square of n). After replacing n ^ 2 with n, only a series of runs will give the correct result. Assigning a large non-integer floating point value to n will fail.

IN typed languages all of these examples (working, not working, sometimes working) are mercilessly interpreted as violations of the type declaration rules and rejected by any compiler.

The question is not we will whether we are pessimists, but whether how much we can afford to be pessimistic. Back to the requirement of realism: if the type rules are so pessimistic that they prevent the computation from being easy to write, we will reject them. But if the achievement of type safety comes with a small loss of expressive power, we'll accept them. For example, in a development environment that provides round and truncate, the n: = r operator is considered invalid because it forces you to explicitly write the real-to-integer conversion instead of using the default ambiguous conversions.

Static typing: how and why

Although the benefits static typing obvious, it's a good idea to talk about them again.

Benefits

Reasons for using static typing in object technology we have listed at the beginning of the lecture. These are reliability, ease of understanding and efficiency.

Reliability due to the detection of errors that otherwise could manifest themselves only during work, and only in some cases. The first of the rules, forcing the declaration of entities, as well as functions, introduces redundancy into the program text, which allows the compiler, using the other two rules, to detect inconsistencies between the intended and real use of entities, components, and expressions.

Early detection of errors is also important because the longer we delay finding them, the more the cost of fixing them will increase. This property, intuitively understood by all professional programmers, is quantitatively confirmed by the well-known works of Boehm. The dependence of the cost of fixing on the time it takes to find errors is shown in a graph based on data from a number of large industrial projects and experiments carried out with a small manageable project:


Fig. 17.1.

Readability or Ease of understanding(readability) has its advantages. In all of the examples in this book, the appearance of a type on an entity gives the reader information about its purpose. Readability is extremely important during the maintenance phase.

Finally, efficiency can determine the success or failure of object technology in practice. In the absence of static typing x.f (arg) can take any amount of time to execute. The reason for this is that at runtime, if f is not found in the base class of target x, the search will continue in its descendants, which is a sure road to inefficiency. You can alleviate the problem by improving the search for a component in the hierarchy. The authors of Self have done a great job trying to generate the best code for the dynamically typed language. But it is precisely static typing allowed such an OO product to approach or equal the effectiveness of traditional software.

The key to static typing is the already stated idea that the compiler that generates the code for the construction x.f (arg) knows the type of x. Due to polymorphism, it is not possible to unambiguously determine the appropriate version of the f component. But the declaration narrows down the many possible types, allowing the compiler to build a table that provides access to the correct f with minimal overhead - bounded constant the complexity of access. Additional optimizations performed static binding and inlining- also made easier by static typing completely eliminating costs where applicable.

Arguments for dynamic typing

Despite all this, dynamic typing does not lose its adherents, in particular among Smalltalk programmers. Their arguments are based primarily on the realism discussed above. They are sure that static typing limits them too much, preventing them from freely expressing their creative ideas, sometimes calling it "chastity belt".

One can agree with this reasoning, but only for statically typed languages ​​that do not support a number of features. It is worth noting that all the concepts associated with the concept of type and introduced in the previous lectures are necessary - the rejection of any of them is fraught with serious restrictions, and their introduction, on the contrary, gives our actions flexibility, and gives us the opportunity to fully enjoy practicality. static typing.

Typification: the terms of success

What are the mechanisms of realistic static typing? All of them were introduced in the previous lectures, and therefore we can only briefly recall them. Listing them together shows the consistency and power of combining them.

Our type system is entirely based on the concept class... Even basic types such as INTEGER are classes, and therefore we do not need special rules for describing predefined types. (This is where our notation differs from "hybrid" languages ​​like Object Pascal, Java, and C ++, where the type system of older languages ​​is combined with class-based object technology.)

Expanded types give us more flexibility by allowing types whose values ​​denote objects as well as types whose values ​​denote references.

The decisive word in the creation of a flexible type system belongs to inheritance and related concept compatibility... This overcomes the main limitation of the classic typed languages, for example, Pascal and Ada, in which the operator x: = y requires that the types of x and y be the same. This rule is too strict: it prohibits the use of entities that can denote objects of related types (SAVINGS_ACCOUNT and CHECKING_ACCOUNT). In inheritance, we only require type compatibility y with type x, for example, x is of type ACCOUNT, y is SAVINGS_ACCOUNT, and the second class inherits from the first.

In practice, a statically typed language needs support multiple inheritance... Principled accusations are known static typing in that it does not provide an opportunity to interpret objects differently. For example, a DOCUMENT object (document) can be transmitted over a network, and therefore needs components associated with the type MESSAGE (message). But this criticism is only true for languages ​​limited single inheritance.


Fig. 17.2.

Versatility is necessary, for example, to describe flexible but safe container data structures (for example class LIST [G] ...). Don't be this mechanism static typing would require declaring different classes for lists with different element types.

In some cases, versatility is required limit, which allows you to use operations that are applicable only to entities of a generic type. If the generic class SORTABLE_LIST supports sorting, it requires entities of type G, where G is a generic parameter, to have a comparison operation. This is achieved by binding to G a class that defines the generic constraint, COMPARABLE:

class SORTABLE_LIST ...

Any actual generic SORTABLE_LIST must be a descendant of the COMPARABLE class that has the required component.

Another mandatory mechanism is attempted assignment- organizes access to those objects, the type of which the software does not control. If y is a database object or an object retrieved over a network, then x? = Y will assign x to y if y is a compatible type, or if it is not, give x to Void.

Assertions associated as part of the Design by Contract idea with classes and their components in the form of preconditions, postconditions and class invariants, make it possible to describe semantic constraints that are not covered type specification... In languages ​​such as Pascal and Ada, there are range types that can limit the values ​​of an entity, for example, to the interval from 10 to 20, however, using them, you will not be able to ensure that the value of i is negative, always double j. Class invariants come to the rescue, designed to accurately reflect the imposed constraints, no matter how complex they are.

Pinned ads are needed in order to avoid avalanche code duplication in practice. Announcing y: like x, you are guaranteed that y will change following any repeated declarations of type x in the child. Without this mechanism, developers would be relentlessly busy with re-declarations, trying to keep the different types consistent.

Sticky declarations are a special case of the last language engine we need - covariance, which we will discuss in detail later.

When developing software systems, in fact, one more property is required, which is inherent in the development environment itself - fast incremental recompilation... When you write or modify a system, you want to see the effect of the change as soon as possible. When static typing you should give the compiler time to type-check. Traditional compilation routines require re-compilation of the entire system (and its assemblies), and this process can be excruciatingly long, especially with the transition to large-scale systems. This phenomenon became an argument in favor of interpreting systems, such as early Lisp or Smalltalk environments, which ran the system with little or no processing, no type checking. This argument is now forgotten. A good modern compiler detects how the code has changed since the last compilation and only processes the changes it finds.

"Is the baby typed?"

Our goal - strict static typing... That is why we must avoid any loopholes in our "game by the rules", or at least accurately identify them if they exist.

The most common loophole in statically typed languages is the presence of transformations that change the type of entity. In C and its derivatives, they are called "casting" or casting (cast). The (OTHER_TYPE) x entry indicates that the value x is interpreted by the compiler as being of type OTHER_TYPE, subject to some restrictions on possible types.

Such mechanisms bypass the limitations of type checking. Casting is widespread in C programming, including the ANSI C dialect. Even in C ++, casting, although not as frequent, remains common and perhaps necessary.

To follow the rules static typing it is not so easy if at any moment they can be bypassed by casting.

Typing and linking

Although as a reader of this book, you will surely distinguish between static and static typing. binding, there are people who cannot do this. This may be partly due to the influence of Smalltalk, which advocates dynamic approach to both problems and capable of forming the misconception that they have the same solution. (We argue in our book that it is desirable to combine static typing and dynamic linking to create robust and flexible programs.)

Both typing and binding deal with the semantics of the Core Construct x.f (arg), but answer two different questions:

Typing and linking

  • The typing question: when do we need to know for sure that at runtime there will be an operation corresponding to f, applicable to the object attached to the entity x (with the arg parameter)?
  • Linking question: when do we need to know what operation a given call initiates?

Typing answers the question of availability at least one operations, binding is responsible for the selection necessary.

Within the framework of the object approach:

  • the problem with typing is with polymorphism: since x at runtime can denote objects of several different types, we must be sure that the operation representing f, available in each of these cases;
  • the binding problem is caused by repeated announcements: since a class can change inherited components, there may be two or more operations that claim to represent f in a given call.

Both tasks can be solved both dynamically and statically. All four solutions are presented in the existing languages.

And dynamic linking is embodied in the notation suggested in this book.

Note the peculiarity of the C ++ language, which supports static typing, although not strict due to the presence of type casting, static linking(default), dynamic linking when explicitly specifying virtual ( virtual) ads.

Reason for choosing static typing and dynamic linking is obvious. The first question is: "When will we know about the existence of the components?" - suggests a static response: " The earlier the better", which means: at compile time. The second question," Which component should I use? "suggests a dynamic response:" the one you need", - corresponding dynamic type an object that is defined at runtime. This is the only viable solution if static and dynamic linking gives different results.

When static typing the compiler will not reject the call if it can be guaranteed that when the program is executed, the object supplied with the component corresponding to lower_landing_gear will be attached to the entity my_aircraft. The basic technique for obtaining guarantees is simple: the mandatory declaration of my_aircraft requires the base class of its type to include such a component. Therefore my_aircraft cannot be declared as AIRCRAFT, since the latter does not have lower_landing_gear at this level; helicopters, at least in our example, do not know how to release the landing gear. If we declare the entity as PLANE, - the class containing the required component - everything will be fine.

Dynamic typing in the Smalltalk style, it requires waiting for the call, and at the time of its execution, check for the presence of the required component. This behavior is possible for prototypes and experimental designs, but unacceptable for industrial systems - at the time of flight it is too late to ask if you have a landing gear.

This article discusses the difference between statically typed and dynamically typed languages, examines the concepts of "strong" and "weak" typing, and compares the power of typing systems in different languages. Recently, there has been a clear movement towards stricter and more powerful typing systems in programming, so it is important to understand what is meant when talking about types and typing.



A type is a collection of possible values. An integer can have the values ​​0, 1, 2, 3, and so on. Boolean can be true or false. You can come up with your own type, for example, the type "GiveFive", in which the values ​​"give" and "5" are possible, and nothing else. It is not a string or a number, it is a new, separate type.


Statically typed languages ​​restrict the types of variables: a programming language might know, for example, that x is an Integer. In this case, the programmer is forbidden to do x = true, it will be incorrect code. The compiler will refuse to compile it, so we can't even run such code. Another statically typed language may have different expressive capabilities, and none of the popular type systems are capable of expressing our type of DayFive (but many can express other, more sophisticated ideas).


Dynamically typed languages ​​mark values ​​with types: the language knows that 1 is an integer, 2 is an integer, but it cannot know that the variable x always contains an integer.


The language runtime checks these labels at different points in time. If we try to add two values ​​together, it can check if they are numbers, strings, or arrays. Then it will add these values, glue them, or give an error, depending on the type.

Statically typed languages

Static languages ​​check the types in a program at compile time, even before the program starts. Any program in which types violate the rules of the language is considered invalid. For example, most static languages ​​will reject the expression "a" + 1 (C is the exception to this rule). The compiler knows that "a" is a string and 1 is an integer, and that + only works when the left and right sides are of the same type. So he doesn't need to run the program to find out that there is a problem. Every expression in a statically typed language is of a specific type that you can define without running your code.


Many statically typed languages ​​require type designation. The Java function public int add (int x, int y) takes two integers and returns the third integer. Other statically typed languages ​​can determine the type automatically. The same addition function in Haskell looks like this: add x y = x + y. We don't tell the language about types, but it can figure them out on its own because it knows that + only works on numbers, so x and y must be numbers, so add takes two numbers as arguments.


This does not reduce the "static" nature of the type system. Haskell's type system is renowned for being static, strict, and powerful, and on all these fronts, Haskell is ahead of Java.

Dynamically typed languages

Dynamically typed languages ​​do not require the type to be specified, but they do not define it themselves. The types of variables are unknown until they have specific values ​​at startup. For example, a function in Python


def f (x, y): return x + y

can add two integers, glue strings, lists, and so on, and we can't figure out what exactly is going on until we run the program. Perhaps at some point the function f will be called with two strings, and with two numbers at another moment. In this case, x and y will contain values ​​of different types at different times. Therefore, it is said that values ​​in dynamic languages ​​have a type, but variables and functions do not. A value of 1 is definitely an integer, but x and y can be anything.

Comparison

Most dynamic languages ​​will throw an error if types are used incorrectly (JavaScript is a known exception; it tries to return a value for any expression, even when it doesn't make sense). When using dynamically typed languages, even a simple error like "a" + 1 can occur in the production environment. Static languages ​​prevent such errors, but of course the degree of prevention depends on the cardinality of the type system.


Static and dynamic languages ​​are built on fundamentally different ideas about program correctness. In the dynamic language "a" + 1, this is a correct program: the code will run and an error will appear in the runtime environment. However, in most statically typed languages, the expression "a" + 1 is not a program: it will not be compiled and will not run. This is not a valid code, just like a bunch of random characters! &% ^ @ * &% ^ @ * Is an invalid code. This additional concept of correctness and incorrectness has no equivalent in dynamic languages.

Strong and weak typing

The concepts of "strong" and "weak" are very ambiguous. Here are some examples of their use:

    Sometimes "strong" means "static".
    It's simple, but it's better to use the term "static" because most people use and understand it.

    Sometimes "strong" means "does not do implicit type conversion".
    For example, JavaScript allows you to write "a" + 1, which can be called "weak typing". But almost all languages ​​provide some level of implicit conversion that allows you to automatically switch from integers to floating point numbers like 1 + 1.1. In reality, most people use the word "strong" to define the line between acceptable and unacceptable transformation. There is no generally accepted border, they are all imprecise and depend on the opinion of a particular person.

    Sometimes "strong" means that there is no way around the strong typing rules in the language.

  • Sometimes "strong" means memory-safe.
    C is an example of a memory-insecure language. If xs is an array of four numbers, then C will happily execute the xs or xs code, returning some value from memory immediately behind xs.

Let's stop. Here's how some languages ​​meet these definitions. As you can see, only Haskell is consistently strong in all respects. Most languages ​​are not so clear.



(The "When How" in the Implicit Conversions column means that the division between strong and weak depends on which conversions we think are acceptable).


Often, the terms "strong" and "weak" refer to a vague combination of different definitions above, and other definitions not shown here. All this confusion makes the words "strong" and "weak" practically meaningless. When you want to use these terms, it is better to describe what exactly is meant. For example, you might say that "JavaScript returns when a string is added with a number, but Python returns an error." In this case, we will not waste our energy trying to come to an agreement on the many meanings of the word "strong". Or, even worse: we end up with an unresolved misunderstanding due to terminology.


In most cases, the terms "strong" and "weak" on the Internet are vague and ill-defined opinions of specific people. They are used to call a language "bad" or "good", and this opinion is translated into technical jargon.



Strong typing: A type system that I love and feel comfortable with.

Weak typing: The type system that bothers me or I'm not comfortable with.

Gradual typing

Can static types be added to dynamic languages? In some cases, yes. In others it is difficult or impossible. The most obvious problem is eval and other similar capabilities of dynamic languages. Doing 1 + eval ("2") in Python gives 3. But what does 1 + eval (read_from_the_network ()) give? It depends on what's on the network at the time of execution. If we get a number, then the expression is correct. If a string, then no. It is not possible to know before launch, so it is not possible to parse the type statically.


An unsatisfactory solution in practice is to give eval () the type Any, which is reminiscent of Object in some object-oriented programming languages ​​or interface () in Go: it is a type that any value satisfies.


Values ​​of type Any are not limited by anything, so the ability of the type system to help us in code with eval disappears. Languages ​​that have both eval and a type system must drop type safety every time eval is used.


Some languages ​​have optional or gradual typing: they are dynamic by default, but allow some static annotations to be added. Python has recently added optional types; TypeScript is a JavaScript add-on that has optional types; Flow performs static analysis of good old JavaScript code.


These languages ​​provide some of the benefits of static typing, but they can never be absolutely guaranteed, as are truly static languages. Some functions will be statically typed and some will be dynamically typed. The programmer always needs to know and be wary of the difference.

Compiling statically typed code

When compiling statically typed code, the syntax is checked first, just like any compiler. Then the types are checked. This means that a static language can initially report one syntax error, and after fixing it, complain about 100 typing errors. The syntax error fix did not generate the 100 typing errors. The compiler simply had no way of detecting type errors until the syntax was fixed.


Compilers for static languages ​​can usually generate faster code than compilers for dynamic ones. For example, if the compiler knows that the add function accepts integers, then it can use the native CPU ADD instruction. The dynamic language will check the type at runtime, choosing one of the many add functions depending on the types (adding integers or floats or gluing strings or maybe lists?) Or you need to decide that an error occurred and the types do not match. All these checks take time. Dynamic languages ​​use a variety of optimization tricks, such as just-in-time compilation, where the code is recompiled at runtime after getting all the information it needs about the types. However, no dynamic language can match the speed of neatly written static code in a language like Rust.

Arguments for static versus dynamic types

Proponents of a static type system point out that without a type system, simple mistakes can lead to problems in production. This is, of course, true. Anyone who has used dynamic language has experienced this for themselves.


Proponents of dynamic languages ​​point out that such languages ​​seem to be easier to code in. This is definitely true for some kinds of code that we write from time to time, like that code with eval. This is a controversial decision for regular work, and it makes sense to remember the vague word "easy" here. Rich Hickey spoke very well about the word "easy" and its connection with the word "simple." After watching this talk, you will realize that it is not easy to use the word "easy" correctly. Beware of "lightness".


The pros and cons of static and dynamic typing systems are still poorly understood, but they are definitely language-specific and specific to the task at hand.


JavaScript tries to continue working even if it means a meaningless conversion (like "a" + 1 giving "a1"). Python, in turn, tries to be conservative and often returns errors, as is the case with "a" + 1.


There are different approaches with different security levels, but Python and JavaScript are both dynamically typed languages.



Haskell, on the other hand, will not allow you to add integer and float without an explicit conversion before doing it. C and Haskell are both statically typed, despite such big differences.


There are many variations of dynamic and static languages. Any unquestioning statement like "static languages ​​are better than dynamic languages ​​when it comes to X" is almost guaranteed nonsense. This may be true for specific languages, but then it is better to say "Haskell is better than Python when it comes to X".

Variety of static typing systems

Let's take a look at two famous examples of statically typed languages: Go and Haskell. Go's typing system does not have generic types, types with "parameters" from other types. For example, you can create your own type for lists MyList, which can store any data we need. We want to be able to create a MyList of integers, a MyList of strings, and so on without changing the MyList source code. The compiler must watch out for typing: if there is a MyList of integers, and we accidentally add a string there, then the compiler must reject the program.


Go was specifically designed so that you can't define types like MyList. The best thing to do is to create a MyList of "empty interfaces": MyList can contain objects, but the compiler simply does not know their type. When we get objects from MyList, we need to tell the compiler their type. If we say "I get a string", but in reality the value is a number, then there will be an execution error, as is the case with dynamic languages.


Go also lacks many of the other features found in modern statically typed languages ​​(or even some systems from the 1970s). The creators of Go had their own reasons for making these decisions, but outsiders' opinions on this issue can sometimes sound harsh.


Now let's compare with Haskell, which has a very powerful type system. If set to type MyList, then the type of "list of numbers" is simply MyList Integer. Haskell prevents us from accidentally adding a string to the list, and makes sure that we don't put an item from the list into a string variable.


Haskell can express much more complex ideas directly with types. For example, Num a => MyList a means "MyList of values ​​that are of the same type of numbers." It could be a list of integers, floats, or fixed-precision decimal numbers, but it will definitely never be a list of strings that is checked at compile time.


You can write an add function that works on any numeric type. This function will have type Num a => (a -> a -> a). It means:

  • a can be any numeric type (Num a =>).
  • The function takes two arguments of type a and returns type a (a -> a -> a).

The last example. If the function type is String -> String, then it takes a string and returns a string. But if it is String -> IO String, then it also does some I / O. This can be access to the disk, to the network, reading from the terminal, and so on.


If the function in the type not IO, then we know that it does not perform any I / O operations. In a web application, for example, you can tell if a function is modifying a database by just looking at its type. No dynamic and almost no static languages ​​are capable of doing this. This is a feature of the languages ​​with the most powerful typing system.


In most languages, we would have to deal with the function and all the functions that are called from there, and so on, trying to find something that modifies the database. It's a tedious process and it's easy to make mistakes. And the Haskell type system can answer this question simply and with a guarantee.


Compare this power to Go, which is incapable of expressing the simple idea of ​​MyList, let alone "a function that takes two arguments, they are both numeric and of the same type, and does I / O."


The Go approach makes it easier to write tools for programming in Go (in particular, the compiler implementation can be simple). Plus, there are fewer concepts to learn. How these benefits compare to significant limitations is a subjective question. However, it cannot be argued that Haskell is harder to learn than Go, and that Haskell's type system is much more powerful, and that Haskell can prevent many more types of compilation bugs.


Go and Haskell are so different languages ​​that grouping them into one class of "static languages" can be misleading, even though the term is being used correctly. In terms of practical security benefits, Go is closer to dynamic languages ​​than to Haskell.


On the other hand, some dynamic languages ​​are safer than some static languages. (Python is generally considered to be much safer than C). When you want to generalize about static or dynamic languages ​​as groups, be aware of the huge number of differences between languages.

Specific examples of differences in the capabilities of typing systems

In more powerful typing systems, you can specify constraints at smaller levels. Here are some examples, but don't dwell on them if the syntax is not clear.


In Go, you can say "the add function takes two integers" and returns an integer ":


func add (x int, y int) int (return x + y)

In Haskell, you can say "a function takes any a numeric type and returns a number of the same type ":


f :: Num a => a -> a -> a add x y = x + y

In Idris, you can say "a function takes two integers" and returns an integer, but the first argument must be less than the second argument ":


add: (x: Nat) -> (y: Nat) -> (auto smaller: LT x y) -> Nat add x y = x + y

If you try to call the add 2 1 function, where the first argument is greater than the second, the compiler will reject the program at compile time... It is impossible to write a program where the first argument is greater than the second. A rare language has this ability. In most languages, this check happens at runtime: we would write something like if x> = y: raise SomeError ().


There is no Haskell equivalent to the Idris example above, and Go has no equivalent to either the Haskell example or the Idris example. As a result, Idris can prevent many bugs that Haskell cannot, and Haskell can prevent many bugs that Go will not notice. In both cases, additional features of the typing system are needed to make the language more complex.

The typing systems of some static languages

Here is a rough list of the typing systems of some languages ​​in ascending order of cardinality. This list will give you a general idea of ​​the power of the systems, you don't need to treat it as the absolute truth. The languages ​​collected in one group can be very different from each other. Each typing system has its own quirks, and most of them are very complex.

  • C (1972), Go (2009): These systems are not powerful at all, without generic type support. It is not possible to specify the type MyList, which would mean "list of integers", "list of strings", and so on. Instead, you have to make a "list of unassigned values". The programmer must manually report "this is a list of strings" each time a string is retrieved from the list, and this can lead to a runtime error.
  • Java (1995), C # (2000): Both languages ​​support generic types, so you can say MyList and get a list of strings that the compiler knows about and can enforce the type rules. The elements from the list will be of type String, the compiler will enforce the rules at compilation as usual, so runtime errors are less likely.
  • Haskell (1990), Rust (2010), Swift (2014): All of these languages ​​have several advanced features, including generic types, algebraic data types (ADTs), and type classes or something similar (class types, traits, and protocols, respectively). Rust and Swift are more popular than Haskell and are promoted by large organizations (Mozilla and Apple, respectively).
  • Agda (2007), Idris (2011): These languages ​​support dependent types, allowing you to create types like "a function that takes two integers x and y, where y is greater than x". Even the "y is greater than x" constraint is enforced at compile time. When done, y will never be less than or equal to x, no matter what happens. Very subtle but important properties of the system can be checked statically in these languages. Very few programmers study them, but these languages ​​cause great enthusiasm among them.

There is a clear movement towards more powerful typing systems, especially when judging by the popularity of languages ​​rather than the mere existence of languages. A famous exception is Go, which explains why many static language proponents consider it a step backwards.


Group two (Java and C #) are mainstream languages, mature and widely used.


Group three are on the cusp of entering the mainstream, with a lot of support from Mozilla (Rust) and Apple (Swift).


Group four (Idris and Agda) are far from the mainstream, but this may change over time. The languages ​​of group three were far from the mainstream a decade ago.

This article contains the necessary minimum of those things that you just need to know about typing in order not to call dynamic typing evil, Lisp a typeless language, and C a strongly typed language.

The full version contains a detailed description of all types of typing, seasoned with code examples, links to popular programming languages ​​and illustrative pictures.

I recommend reading the short version of the article first, and then, if desired, the full version.

Short version

By typing, programming languages ​​are usually divided into two big camps - typed and untyped (typeless). The former includes, for example, C, Python, Scala, PHP, and Lua, while the latter includes assembly language, Forth, and Brainfuck.

Since "typeless typing" is inherently as simple as a cork, further it is not divided into any other types. But typed languages ​​are divided into several overlapping categories:

  • Static / dynamic typing. Static is defined by the fact that the final types of variables and functions are set at compile time. Those. already the compiler is 100% sure which type is where. In dynamic typing, all types are found out at runtime.

    Examples:
    Static: C, Java, C #;
    Dynamic: Python, JavaScript, Ruby.

  • Strong / weak typing (also sometimes said to be strong / weak). Strong typing is distinguished by the fact that the language does not allow mixing different types in expressions and does not perform automatic implicit conversions, for example, you cannot subtract a set from a string. Weakly typed languages ​​perform many implicit conversions automatically, even if loss of precision or ambiguity may occur.

    Examples:
    Strong: Java, Python, Haskell, Lisp;
    Weak: C, JavaScript, Visual Basic, PHP.

  • Explicit / implicit typing. Explicitly typed languages ​​differ in that the type of new variables / functions / their arguments must be specified explicitly. Accordingly, languages ​​with implicit typing shift this task to the compiler / interpreter.

    Examples:
    Explicit: C ++, D, C #
    Implicit: PHP, Lua, JavaScript

It should also be noted that all these categories overlap, for example, C has static weak explicit typing, and Python has dynamic strong implicit typing.

However, there are no languages ​​with static and dynamic typing at the same time. Although running ahead I'll say that I'm lying here - they really exist, but more on that later.

Detailed version

If the short version didn't seem enough for you, good. No wonder I wrote detailed? The main thing is that in the short version it was simply impossible to fit all the useful and interesting information, and the detailed one would probably be too long for everyone to read it without straining.

Typeless typing

In typeless programming languages, all entities are considered simply sequences of bits of various lengths.

Typeless typing is usually inherent in low-level (assembly language, Forth) and esoteric (Brainfuck, HQ9, Piet) languages. However, along with its disadvantages, it also has some advantages.

Benefits
  • Allows writing at an extremely low level, and the compiler / interpreter will not interfere with any type checking. You are free to perform any operations on any kind of data.
  • The resulting code is usually more efficient.
  • Transparency of instructions. With knowledge of the language, there is usually no doubt what this or that code is.
disadvantages
  • Complexity. It is often necessary to represent complex values ​​such as lists, strings, or structures. This can be inconvenient.
  • Lack of checks. Any meaningless actions, such as subtracting a pointer to an array from a symbol, will be considered completely normal, which is fraught with subtle errors.
  • Low level of abstraction. Working with any complex data type is no different from working with numbers, which of course will create many difficulties.
Strong typeless typing?

Yes, it does exist. For example, in assembly language (for the x86 / x86-64 architecture, I don’t know others), you cannot assemble the program if you try to load data from the rax register (64 bits) into the cx register (16 bits).

mov cx, eax; assembly time error

So it turns out that there is still typing in the assembler? I believe that these checks are not enough. And your opinion, of course, depends only on you.

Static versus dynamic typing

The main thing that distinguishes static typing from dynamic typing is that all type checking is done at compile time, not run time.

Some people may think that static typing is too limited (in fact, it is, but it got rid of it a long time ago with the help of some techniques). To some, dynamically typed languages ​​are a game with fire, but what are the features that make them stand out? Do both species have a chance of existence? If not, why are there many both statically and dynamically typed languages?

Let's figure it out.

Benefits of static typing
  • Type checks happen only once, at compile time. This means that we will not have to constantly find out if we are trying to divide a number by a string (and either throw an error or perform a conversion).
  • Execution speed. It is clear from the previous point that statically typed languages ​​are almost always faster than dynamically typed ones.
  • Under some additional conditions, it allows you to detect potential errors already at the compilation stage.
Benefits of dynamic typing
  • Ease of creating universal collections - a bunch of everything and everything (such a need rarely arises, but when dynamic typing arises it will help out).
  • Convenience of describing generalized algorithms (for example, sorting an array, which will work not only on a list of integers, but also on a list of real numbers and even on a list of strings).
  • Easy to learn - dynamically typed languages ​​are usually very good at getting started with programming.

Generalized programming

Okay, the most important argument for dynamic typing is the convenience of describing generic algorithms. Let's imagine a problem - we need a function to search across multiple arrays (or lists) - an array of integers, an array of real numbers, and an array of characters.

How are we going to solve it? Let's solve it in 3 different languages: one with dynamic typing and two with static typing.

The search algorithm I will take is one of the simplest - brute force. The function will receive the required element, the array (or list) itself, and return the index of the element, or, if the element is not found, (-1).

Dynamic solution (Python):

Def find (required_element, list): for (index, element) in enumerate (list): if element == required_element: return index return (-1)

As you can see, everything is simple and there are no problems with the fact that a list can contain at least numbers, even lists, even though there are no other arrays. Very good. Let's go ahead and solve the same problem in C!

Static solution (C):

Unsigned int find_int (int required_element, int array, unsigned int size) (for (unsigned int i = 0; i< size; ++i) if (required_element == array[i]) return i; return (-1); } unsigned int find_float(float required_element, float array, unsigned int size) { for (unsigned int i = 0; i < size; ++i) if (required_element == array[i]) return i; return (-1); } unsigned int find_char(char required_element, char array, unsigned int size) { for (unsigned int i = 0; i < size; ++i) if (required_element == array[i]) return i; return (-1); }

Well, each function is individually similar to the Python version, but why are there three? Has static programming really failed?

Yes and no. There are several programming techniques, one of which we will now consider. It's called generic programming and the C ++ language supports it pretty well. Let's take a look at the new version:

Static solution (generic programming, C ++):

Template unsigned int find (T required_element, std :: vector array) (for (unsigned int i = 0; i< array.size(); ++i) if (required_element == array[i]) return i; return (-1); }

Okay! It doesn't look much more complicated than the Python version and doesn't have to write a lot. In addition, we got an implementation for all arrays, not just 3 ones needed to solve the problem!

This version seems to be exactly what we need - we get both the advantages of static typing and some of the advantages of dynamic.

It's great that it's even possible, but it could be even better. Firstly, generic programming can be more convenient and beautiful (for example, in the Haskell language). Secondly, in addition to generic programming, you can also use polymorphism (the result will be worse), function overloading (similarly), or macros.

Static dynamics

It should also be mentioned that many static languages ​​allow dynamic typing, for example:

  • C # supports the dynamic pseudo-type.
  • F # supports syntactic sugar in the form of the? Operator, on the basis of which dynamic typing simulations can be implemented.
  • Haskell - dynamic typing is provided by the Data.Dynamic module.
  • Delphi - through the special type Variant.

Also, some dynamically typed languages ​​allow you to take advantage of static typing:

  • Common Lisp - type declarations.
  • Perl - since 5.6, rather limited.

Strong and weak typing

Strongly typed languages ​​do not allow mixing entities of different types in expressions and do not perform any automatic conversions. They are also called "strongly typed languages". The English term for this is strong typing.

Weakly typed languages, on the contrary, do their best to make the programmer mix different types in one expression, and the compiler itself will bring everything to a single type. They are also called "loosely typed languages". The English term for this is weak typing.

Weak typing is often confused with dynamic typing, which is completely wrong. A dynamically typed language can be both weakly and strongly typed.

However, few people attach importance to the strictness of typing. It is often claimed that if a language is statically typed, you can catch a lot of potential compilation errors. They lie to you!

At the same time, the language must also have strong typing. Indeed, if the compiler, instead of reporting an error, simply adds a string to a number, or even worse, subtracts another from one array, what good is it to us that all the "type checks" are at compile time? That's right - weak static typing is even worse than strong dynamic typing! (Well that's my opinion)

So does weak typing have no advantages at all? It may look like this, but while I am a staunch supporter of strong typing, I must agree that weak typing also has advantages.

Want to know which ones?

The benefits of strong typing
  • Reliability - You will receive an exception or compilation error instead of incorrect behavior.
  • Speed ​​- instead of hidden conversions, which can be quite expensive, with strong typing, you need to write them explicitly, which makes the programmer at least know that this piece of code can be slow.
  • Understanding the work of the program - again, instead of implicit type conversion, the programmer writes everything himself, which means he roughly understands that the comparison of a string and a number does not happen by itself and not by magic.
  • Certainty - when you write conversions by hand, you know exactly what you are converting to and what. Also, you will always understand that such transformations can lead to loss of accuracy and to incorrect results.
The benefits of weak typing
  • Convenience of using mixed expressions (for example, from integers and real numbers).
  • Abstraction from typing and focus on the task.
  • The brevity of the record.

Okay, we figured it out, it turns out that weak typing also has advantages! Are there ways to transfer the benefits of weak typing to strong typing?

It turns out there are even two.

Implicit type conversion, in unambiguous situations and without data loss

Uh ... Quite a long point. Let me abbreviate it further to "limited implicit conversion" So what does the unambiguous situation and data loss mean?

An unambiguous situation is a transformation or operation in which the entity is immediately understandable. For example, adding two numbers is an unambiguous situation. And converting a number to an array is not (perhaps an array of one element will be created, perhaps an array with such a length, filled with elements by default, and perhaps the number is converted to a string, and then to an array of characters).

Losing data is even easier. If we convert the real number 3.5 to an integer, we will lose part of the data (in fact, this operation is also ambiguous - how will rounding be done? Up? Down? Discarding the fractional part?).

Ambiguous transformations and lossy transformations are very, very bad. There is nothing worse than this in programming.

If you don't believe me, learn the PL / I language, or even just search for its specification. It has rules for converting between ALL data types! It's just hell!

Okay, let's remember the constrained implicit conversion. Are there such languages? Yes, for example in Pascal you can convert an integer to a real number, but not vice versa. There are also similar mechanisms in C #, Groovy, and Common Lisp.

Okay, I said that there is still a way to get a couple of advantages of weak typing in a strong language. And yes, it is and is called constructor polymorphism.

I'll explain it using the wonderful Haskell language as an example.

Polymorphic constructors are a result of the observation that safe implicit conversions are most often needed when using numeric literals.

For example, in the expression pi + 1, you don't want to write pi + 1.0 or pi + float (1). I just want to write pi + 1!

And this is done in Haskell because literal 1 has no concrete type. It is neither whole, nor real, nor complex. It's just a number!

As a result, when writing a simple function sum xy, multiplying all numbers from x to y (with an increment of 1), we get several versions at once - sum for integers, sum for real, sum for rational, sum for complex numbers, and even sum for all those numeric types that you yourself have defined.

Of course, this technique only saves when using mixed expressions with numeric literals, and this is just the tip of the iceberg.

Thus, we can say that the best solution would be to balance on the brink, between strong and weak typing. But so far no language is in perfect balance, so I tend to lean more towards strongly typed languages ​​(such as Haskell, Java, C #, Python) rather than weakly typed ones (such as C, JavaScript, Lua, PHP).

Explicit versus implicit typing

An explicitly typed language assumes that the programmer must specify the types of all variables and functions that it declares. The English term for this is explicit typing.

An implicitly typed language, on the other hand, invites you to forget about types and leave the task of type inference to the compiler or interpreter. The English term for this is implicit typing.

At first, you can decide that implicit typing is equivalent to dynamic typing, and explicit typing is equivalent to static, but later we will see that this is not the case.

Are there advantages to each type, and again, are there combinations of them and are there languages ​​that support both methods?

Benefits of Explicit Typing
  • The presence of a signature for each function (for example int add (int, int)) allows you to easily determine what the function does.
  • The programmer immediately writes down what type of value can be stored in a particular variable, which removes the need to remember this.
Benefits of Implicit Typing
  • Shorthand for def add (x, y) is clearly shorter than int add (int x, int y).
  • Resilience to change. For example, if in a function the temporary variable was of the same type as the input argument, then in an explicitly typed language, when the type of the input argument is changed, the type of the temporary variable will also need to be changed.

Okay, you can see that both approaches have pros and cons (who expected anything else?), So let's look for ways to combine the two!

Explicit typing by choice

There are languages ​​with implicit typing by default and the ability to specify the type of values ​​if necessary. The translator will automatically deduce the real type of expression. One of these languages ​​is Haskell, let me give you a simple example for clarity:

Without explicit type indication add (x, y) = x + y - Explicit type indication add :: (Integer, Integer) -> Integer add (x, y) = x + y

Note: I intentionally used an uncurried function, and also intentionally wrote a private signature instead of the more general add :: (Num a) -> a -> a -> a, since wanted to show the idea, without explaining the syntax of Haskell "a.

Hmm. As we can see, it is very nice and short. The function record takes only 18 characters on one line, including spaces!

However, automatic type inference is a tricky thing, and even in a cool language like Haskell, it sometimes fails. (as an example, the restriction of monomorphism can be cited)

Are there languages ​​that are explicitly typed by default and implicitly typed by necessity? Kon
forever.

Optional implicit typing

The new C ++ language standard called C ++ 11 (formerly called C ++ 0x) introduced the auto keyword, which allows the compiler to infer the type based on context:

Let's compare: // Manual specification of unsigned type int a = 5; unsigned int b = a + 3; // Automatic inference of unsigned type int a = 5; auto b = a + 3;

Not bad. But the record did not shrink much. Let's see an example with iterators (if you don't understand, do not be afraid, the main thing is to note that the record is greatly reduced due to automatic output):

// Manual specification of the std :: vector type vec = randomVector (30); for (std :: vector :: const_iterator it = vec.cbegin (); ...) (...) // Automatic type inference auto vec = randomVector (thirty); for (auto it = vec.cbegin (); ...) (...)

Wow! This is the abbreviation. Okay, but can you do something in the spirit of Haskell, where the return type will depend on the types of the arguments?

Again, the answer is yes, thanks to the decltype keyword in combination with auto:

// Manual specification of type int divide (int x, int y) (...) // Automatic type deduction auto divide (int x, int y) -> decltype (x / y) (...)

It may seem like this notation is very good, but when combined with generic programming (templates / generics), implicit typing or automatic type inference works wonders.

Some programming languages ​​according to this classification

I will give a short list of popular languages ​​and write how they are categorized under each category of "typing".

JavaScript - Dynamic / Weak / Implicit Ruby - Dynamic / Strong / Implicit Python - Dynamic / Strong / Implicit Java - Static / Strong / Explicit PHP - Dynamic / Weak / Implicit C - Static / Weak / Explicit C ++ - Static / Semi-Strong / Explicit Perl - Dynamic / Weak / Implicit Objective-C - Static / Weak / Explicit C # - Static / Strong / Explicit Haskell - Static / Strong / Implicit Common Lisp - Dynamic / Strong / Implicit

Perhaps I was wrong somewhere, especially with CL, PHP and Obj-C, if you have a different opinion on some language - write in the comments.

Typing - assigning a type to informational entities.

The most common primitive data types are:

  • Numerical
  • Character
  • Logical

The main functions of the data type system:

  • Security
    Each operation is checked for arguments of exactly the types for which it is intended;
  • Optimization
    Based on the type, a method of efficient storage and algorithms for its processing are selected;
  • Documentation
    The intentions of the programmer are emphasized;
  • Abstraction
    Using high-level data types allows the programmer to think of values ​​as high-level entities rather than a collection of bits.

Classification

There are many classifications of typing programming languages, but the main ones are only 3:

Static / dynamic typing

Static - assignment and type consistency checking is done at compile time. Data types are associated with variables, not specific values. Static typing allows you to find typing errors made in rarely used branches of the program logic at compile time.

Dynamic typing is the opposite of static typing. In dynamic typing, all types are figured out at runtime.

Dynamic typing allows you to create more flexible software, albeit at the cost of more typing errors. Unit testing is especially important when developing software in dynamically typed programming languages, because it is the only way to find typing errors in rarely used branches of program logic.

Dynamic typing

Var luckyNumber = 777; var siteName = "Tyapk"; // we mean a number, write a string var wrongNumber = "999";

Static typing

Let luckyNumber: number = 777; let siteName: string = "Tyapk"; // will raise an error let wrongNumber: number = "999";

  • Static: Java, C #, TypeScript.
  • Dynamic: Python, Ruby, JavaScript.

Explicit / implicit typing.

Explicitly typed languages ​​differ in that the type of new variables / functions / their arguments must be specified explicitly. Accordingly, languages ​​with implicit typing shift this task to the compiler / interpreter. Explicit typing is the opposite of implicit typing.

Explicit typing requires an explicit type declaration for each variable used. This type of typing is a special case of static typing, since the type of each variable is determined at compile time.

Implicit typing

Let stringVar = "777" + 99; // get "77799"

Explicit typing (fictional JS-like language)

Let wrongStringVar = "777" + 99; // throws an error let stringVar = "777" + String (99); // get "77799"

Strong / loose typing

Also called strong / weak typing. With strong typing, types are assigned "once and for all", with lax typing, they can change during program execution.

Strongly typed languages ​​do not allow changes to the data type of a variable, and only explicit conversions of data types are allowed. Strong typing is distinguished by the fact that the language does not allow mixing different types in expressions and does not perform automatic implicit conversions, for example, you cannot subtract a number from a string. Weakly typed languages ​​perform many implicit conversions automatically, even if loss of precision or ambiguity may occur.

Strong typing (fictional JS-like language)

Let wrongNumber = 777; wrongNumber = wrongNumber + "99"; // we get an error that a string is added to the numeric variable wrongNumber let trueNumber = 777 + Number ("99"); // get 876

Loose typing (as is in js)

Let wrongNumber = 777; wrongNumber = wrongNumber + "99"; // got the string "77799"

  • Strict: Java, Python, Haskell, Lisp.
  • Lax: C, JavaScript, Visual Basic, PHP.
Share this