Network electronic scientific journal "Systems Engineering". Ensuring high availability Failure rate is expressed in

When considering the laws of failure distribution, it was found that the failure rates of elements can be either constant or vary depending on the operating time. For long-term use systems, which include all transport systems, preventive maintenance is provided, which virtually eliminates the impact of wear-out failures, so only sudden failures occur.

This greatly simplifies reliability calculations. However, complex systems consist of many elements connected in different ways. When the system is in operation, some of its elements operate continuously, others only at certain periods of time, and others perform only short switching or connecting operations. Consequently, during a given period of time, only some elements have an operating time that coincides with the operating time of the system, while others operate for a shorter time.

In this case, to calculate the operating time of a given system, only the time during which the element is turned on is considered; This approach is possible if we assume that during periods when elements are not included in the operation of the system, their failure rate is zero.

From a reliability point of view, the most common scheme is a series connection of elements. In this case, the calculation uses the rule of product of reliability:

Where R(ti)- reliability i-th element that is included on t i hours of the total system operating time t h.


For calculations, the so-called

employment rate equal to

i.e., the ratio of the operating time of the element to the operating time of the system. The practical meaning of this coefficient is that for an element with a known failure rate, the failure rate in the system, taking into account the operating time, will be equal to

The same approach can be used in relation to individual system nodes.

Another factor that should be considered when analyzing system reliability is the level of workload with which the elements operate in the system, since this largely determines the magnitude of the expected failure rate.

The failure rate of elements changes significantly even with small changes in the workload affecting them.

In this case, the main difficulty in the calculation is caused by the variety of factors that determine both the concept of element strength and the concept of load.

The strength of an element combines its resistance to mechanical loads, vibrations, pressure, acceleration, etc. The category of strength also includes resistance to thermal loads, electrical strength, moisture resistance, resistance to corrosion and a number of other properties. Therefore, strength cannot be expressed by some numerical value and there are no units of strength that take into account all these factors. The manifestations of load are also diverse. Therefore, to assess strength and load, statistical methods are used to determine the observed effect of failure of an element over time under the influence of a series of loads or under the influence of a predominant load.

Elements are designed so that they can withstand rated loads. When operating elements under rated load conditions, a certain pattern in the intensity of their sudden failures is observed. This rate is called the nominal sudden failure rate of the elements, and it is the reference value for determining the actual sudden failure rate of the real element (taking into account the operating time and workload).

For a real element or system, three main environmental influences are currently considered: mechanical, thermal and operating loads.

The influence of mechanical influences is taken into account by the coefficient, the value of which is determined by the installation location of the equipment, and can be taken equal to:

for laboratories and comfortable premises - 1

, stationary ground installations - 10

, railway rolling stock - 30.

Nominal sudden failure rate selected by

table 3, should be increased by times depending on the installation location of the device in operation.

Curves Fig. 7 illustrate the general nature of changes in the intensity of sudden failures of electrical and electronic elements depending on the heating temperature and the magnitude of the workload.

The intensity of sudden failures with increasing workload, as can be seen from the curves above, increases logarithmically. These curves also show how it is possible to reduce the rate of sudden failures of elements even to a value less than the nominal value. A significant reduction in the rate of sudden failures is achieved if the elements operate at loads below their rated values.


Rice. 16

Rice. 7 can be used when carrying out indicative (training) calculations of the reliability of any electrical and electronic elements. The nominal mode in this case corresponds to a temperature of 80°C and 100% of the working load.

If the calculated parameters of the element differ from the nominal values, then according to the curves in Fig. 7, the increase for the selected parameters can be determined and a ratio can be obtained by which the value of the failure rate of the element in question is multiplied.

High reliability can be built into the design of elements and systems. To do this, it is necessary to strive to reduce the temperature of the elements during operation and use elements with increased nominal parameters, which is equivalent to a reduction in workloads.

The increase in the cost of manufacturing the product in any case pays off by reducing operating costs.


Failure rate for electrical circuit elements
depending on the load can be defined as follows
according to empirical formulas. In particular, depending
on operating voltage and temperature

Table value at rated voltage and temperature t i .

- failure rate at operating voltage U 2 and temperature t2.

It is assumed that the mechanical effects remain at the same level. Depending on the type and type of elements, the value p, varies from 4 to 10, and the value TO within 1.02 1.15.

When determining the actual failure rate of elements, it is necessary to have a good idea of ​​the expected load levels at which the elements will operate, and to calculate the values ​​of electrical and thermal parameters taking into account transient modes. Correct identification of loads acting on individual elements leads to a significant increase in the accuracy of reliability calculations.

When calculating reliability taking into account wear failures, it is also necessary to take into account the operating conditions. Durability values M, given in table. 3, as well as refer to the nominal load mode and laboratory conditions. All elements operating under other conditions have a durability that differs from the current one by an amount TO Magnitude TO can be taken equal to:

for the laboratory - 1.0

, ground installations - 0.3

, railway rolling stock - 0.17

Small fluctuations in the coefficient TO possible for equipment for various purposes.

To determine expected durability M it is necessary to multiply the average (nominal) durability determined from the table by a coefficient TO .

In the absence of materials necessary to determine failure rates depending on load levels, the coefficient method for calculating failure rates can be used.

The essence of the coefficient calculation method is that when calculating equipment reliability criteria, coefficients are used that relate the failure rate of elements of various types with the failure rate of an element whose reliability characteristics are reliably known.

It is assumed that the exponential law of reliability is valid, and the failure rates of elements of all types vary depending on operating conditions to the same extent. The last assumption means that under different operating conditions the following relation is valid:

The failure rate of an element whose quantitative characteristics are known;

Reliability factor i-th element. An element with a failure rate ^ 0 is called the main element of the system calculation. When calculating the coefficients K i The wire-unregulated resistance is taken as the main element of the system calculation. In this case, to calculate the reliability of the system, it is not necessary to know the failure rate of elements of all types. It is enough to know only the reliability coefficients K i, the number of elements in the circuit and the failure rate of the main element of the calculation Since K i has a scatter of values, then the reliability is checked both for TO min , and for TO max. Values Ki, determined based on the analysis of data on failure rates, for equipment for various purposes are given in table. 5.

Table 5

The failure rate of the main element of the calculation (in this case, the resistance) should be determined as the weighted average value of the failure rates of the resistances used in the designed system, i.e.

AND N R- failure rate and number of resistances i-th type and rating;

T- number of types and ratings of resistances.

It is advisable to construct the resulting dependence of the system reliability on the operating time for both the values TO min , so for TO swing

Having information about the reliability of individual elements included in the system, it is possible to give a general assessment of the reliability of the system and identify blocks and assemblies that require further improvement. To do this, the system under study is divided into nodes according to constructive or semantic characteristics (a block diagram is drawn up). For each selected node, reliability is determined (nodes with less reliability require revision and improvement first).

When comparing the reliability of components, and even more so of different system options, it should be remembered that the absolute value of reliability does not reflect the behavior of the system in operation and its effectiveness. The same level of system reliability can be achieved in one case due to the main elements, the repair and replacement of which requires significant time and large material costs (for an electric locomotive, removal from train work); in another case, these are small elements, the replacement of which is carried out by the maintenance personnel without removing the machine from work. Therefore, for a comparative analysis of designed systems, it is recommended to compare the reliability of elements that are similar in their meaning and consequences arising from their failures.

When making approximate reliability calculations, you can use data from operating experience of similar systems. which to some extent takes into account operating conditions. In this case, the calculation can be carried out in two ways: by the average level of reliability of equipment of the same type or by a conversion factor to real operating conditions.

The calculation based on the average level of reliability is based on the assumption that the designed equipment and the operating sample are equal. This can be allowed with identical elements, similar systems and the same ratio of elements in the system.

The essence of the method is that

I is the number of elements and the mean time between failures of the sample equipment;

And - the same for the designed equipment. From this relationship it is easy to determine the mean time between failures for the designed hardware:

The advantage of the method is its simplicity. Disadvantages - the absence, as a rule, of a sample of operating equipment suitable for comparison with the designed device.

The basis of the calculation using the second method is the determination of the conversion factor, which takes into account the operating conditions of similar equipment. To determine it, a similar system operated under specified conditions is selected. Other requirements may not be met. For the selected operating system, reliability indicators are determined using the data in Table. 3, the same indicators are determined separately from operational data.

The conversion factor is defined as the ratio

- MTBF according to operating data;

T oz- mean time between failures according to calculation.

For the designed equipment, reliability indicators are calculated using the same tabular data as for the operating system. Then the results obtained are multiplied by K e.

Coefficient K e takes into account real operating conditions - preventive repairs and their quality, replacement of parts between repairs, qualifications of maintenance personnel, condition of depot equipment, etc., which cannot be foreseen using other calculation methods. Values K e may be greater than one.

Any of the considered calculation methods can be carried out for a given reliability, i.e., by the opposite method - from system reliability and mean time between failures to the choice of indicators of the constituent elements.

1.1 Probability of failure-free operation

The probability of failure-free operation is the probability that, under certain operating conditions, within a given operating time, not a single failure will occur.
The probability of failure-free operation is denoted as P(l) , which is determined by formula (1.1):

Where N 0 - number of elements at the beginning of the test;r(l) is the number of element failures at the time of operating time.It should be noted that the larger the valueN 0 , the more accurately you can calculate the probabilityP(l).
At the beginning of operation of a working locomotive P(0) = 1, since during the run l= 0, the probability that not a single element will fail takes a maximum value of 1. With increasing mileage l probability P(l) will decrease. As the service life approaches an infinitely large value, the probability of failure-free operation will tend to zero. P(l→∞) = 0. Thus, during the operating process, the probability of failure-free operation varies from 1 to 0. The nature of the change in the probability of failure-free operation as a function of mileage is shown in Fig. 1.1.

Fig.2.1. Graph of changes in the probability of failure-free operation P(l) depending on operating time

The main advantages of using this indicator in calculations are two factors: firstly, the probability of failure-free operation covers all factors affecting the reliability of the elements, allowing one to judge its reliability quite simply, because the larger the valueP(l), the higher the reliability; secondly, the probability of failure-free operation can be used in calculating the reliability of complex systems consisting of more than one element.

1.2 Probability of failure

The probability of failure is the probability that, under certain operating conditions, within a given operating time, at least one failure will occur.
The probability of failure is denoted as Q(l), which is determined by formula (1.2):

At the beginning of operation of a serviceable locomotiveQ(0) = 0, since during the runl= 0, the probability that at least one element will fail takes a minimum value of 0. With increasing mileagelprobability of failureQ(l) will increase. As the service life approaches an infinitely large value, the probability of failure will tend to unityQ(l→∞ ) = 1. Thus, during the operating process, the value of the probability of failure varies from 0 to 1. The nature of the change in the probability of failure as a function of mileage is shown in Fig. 1.2. The probability of failure-free operation and the probability of failure are opposite and incompatible events.

Fig.2.2. Graph of change in probability of failure Q(l) depending on operating time

1.3 Failure rate

Failure rate is the ratio of the number of elements per unit time or mileage divided by the initial number of tested elements. In other words, the failure rate is an indicator characterizing the rate of change in the probability of failures and the probability of failure-free operation as the duration of operation increases.
The failure rate is denoted as and determined by formula (1.3):

where is the number of failed elements during the mileage.
This indicator allows you to judge by its value the number of elements that will fail over a certain period of time or mileage, and by its value you can calculate the number of required spare parts.
The nature of the change in the failure rate as a function of mileage is shown in Fig. 1.3.


Rice. 1.3. Graph of change in failure rate depending on operating time

1.4 Failure rate

The failure rate is the conditional density of the occurrence of a failure of an object, determined for the considered moment of time or operating time, provided that the failure did not occur before this moment. Otherwise, the failure rate is the ratio of the number of failed elements per unit of time or mileage to the number of properly functioning elements in a given period of time.
The failure rate is denoted as and determined by formula (1.4):

Where

As a rule, the failure rate is a non-decreasing function of time. Failure rate is usually used to assess the propensity for failure at various points in the operation of objects.
In Fig. 1.4. The theoretical nature of the change in failure rate as a function of mileage is presented.

Rice. 1.4. Graph of change in failure rate depending on operating time

On the graph of changes in failure rate shown in Fig. 1.4. Three main stages can be distinguished, reflecting the process of operation of an element or object as a whole.
The first stage, which is also called the running-in stage, is characterized by an increase in the failure rate during the initial period of operation. The reason for the increase in failure rate at this stage is hidden manufacturing defects.
The second stage, or the period of normal operation, is characterized by the tendency of the failure rate to a constant value. During this period, random failures may occur due to the occurrence of sudden load concentrations exceeding the ultimate strength of the element.
The third stage is the so-called period of accelerated aging. Characterized by the occurrence of wear failures. Further operation of the element without replacing it becomes economically irrational.

1.5 Mean time to failure

Mean time to failure is the average mileage of an element without failure before failure.
Mean time to failure is denoted as L 1 and is determined by formula (1.5):

Where l i- time to failure of the element; r i- number of failures.
Mean time to failure can be used to preliminarily determine the timing of repair or replacement of an element.

1.6 Average value of failure flow parameter

The average value of the failure flow parameter characterizes the average probability density of the occurrence of an object failure, determined for the considered moment in time.
The average value of the failure flow parameter is denoted as W Wed and is determined by formula (1.6):

1.7 Example of calculating reliability indicators

Initial data.
During the run from 0 to 600 thousand km, information on traction motor failures was collected in the locomotive depot. At the same time, the number of serviceable electric motors at the beginning of the operation period was N0 = 180 pcs. The total number of failed electric motors during the analyzed period was ∑r(600000) = 60. The mileage interval was assumed to be 100 thousand km. At the same time, the number of failed TEDs for each section was: 2, 12, 16, 10, 14, 6.

Required.
It is necessary to calculate reliability indicators and plot their changes over time.

First you need to fill out the table of initial data as shown in the table. 1.1.

Table 1.1.

Initial data for calculation
, thousand km 0 - 100 100 - 200 200 - 300 300 - 400 400 - 500 500 - 600
2 12 16 10 14 6
2 14 30 40 54 60

Initially, using equation (1.1), we determine for each section of the run the value of the probability of failure-free operation. So, for the section from 0 to 100 and from 100 to 200 thousand km. mileage, the probability of failure-free operation will be:

Let's calculate the failure rate using equation (1.3).

Then the failure rate in the section 0-100 thousand km. will be equal to:

In a similar way, we determine the value of the failure rate for the interval of 100-200 thousand km.

Using equations (1.5 and 1.6), we determine the average time to failure and the average value of the failure flow parameter.

Let us systematize the obtained calculation results and present them in the form of a table (Table 1.2.).

Table 1.2.

Results of calculating reliability indicators
, thousand km 0 - 100 100 - 200 200 - 300 300 - 400 400 - 500 500 - 600
2 12 16 10 14 6
2 14 30 40 54 60
P(l) 0,989 0,922 0,833 0,778 0,7 0,667
Q(l) 0,011 0,078 0,167 0,222 0,3 0,333
10 -7 .1/km 1,111 6,667 8,889 5,556 7,778 3,333
10 -7 .1/km 1,117 6,977 10,127 6,897 10,526 4,878

Let us present the nature of the change in the probability of failure-free operation of the electric motor depending on the mileage (Fig. 1.5.). It should be noted that the first point on the graph, i.e. with a mileage of 0, the probability of failure-free operation will take a maximum value of 1.

Rice. 1.5. Graph of changes in the probability of failure-free operation depending on operating hours

Let us present the nature of the change in the probability of failure of the electric motor depending on the mileage (Fig. 1.6.). It should be noted that the first point on the graph, i.e. when the mileage is 0, the probability of failure will take a minimum value of 0.

Rice. 1.6. Graph of change in probability of failure depending on operating time

Let us present the nature of the change in the frequency of failures of electric motors depending on the mileage (Fig. 1.7.).

Rice. 1.7. Graph of changes in failure rate depending on operating hours

In Fig. 1.8. The dependence of the change in failure rate on operating time is presented.

Rice. 1.8. Graph of change in failure rate depending on operating time

2.1 Exponential law of distribution of random variables

The exponential law quite accurately describes the reliability of nodes in the event of sudden failures of a random nature. Attempts to apply it to other types and cases of failures, especially gradual ones caused by wear and changes in the physicochemical properties of elements, showed its insufficient acceptability.

Initial data.
As a result of testing ten high-pressure fuel pumps, their operating time to failure was obtained: 400, 440, 500, 600, 670, 700, 800, 1200, 1600, 1800 hours. Assuming that the operating time to failure of fuel pumps obeys an exponential distribution law.

Required.
Assess the magnitude of the failure rate, and also calculate the probability of failure-free operation for the first 500 hours and the probability of failure in the time interval between 800 and 900 hours of diesel operation.

First, we determine the average operating time of fuel pumps before failure using the equation:

Then we calculate the failure rate:

The probability of failure-free operation of fuel pumps with an operating time of 500 hours will be:

The probability of failure between 800 and 900 hours of pump operation will be:

2.2 Weibull-Gnedenko distribution law

The Weibull-Gnedenko distribution law has become widespread and is used in relation to systems consisting of a series of elements connected in series from the point of view of ensuring system reliability. For example, systems servicing a diesel generator set: lubrication, cooling, fuel supply, air supply, etc.

Initial data.
The downtime of diesel locomotives during unscheduled repairs due to the fault of auxiliary equipment obeys the Weibull-Gnedenko distribution law with parameters b=2 and a=46.

Required.
It is necessary to determine the probability of diesel locomotives recovering from unscheduled repairs after 24 hours of downtime and the downtime during which operation will be restored with a probability of 0.95.

Let's find the probability of restoring the locomotive's performance after it has been idle in the depot for 24 hours using the equation:

To determine the recovery time of the locomotive with a given confidence probability value, we also use the expression:

2.3 Rayleigh distribution law

The Rayleigh distribution law is used mainly to analyze the operation of elements that have a pronounced aging effect (elements of electrical equipment, various types of seals, washers, gaskets made of rubber or synthetic materials).

Initial data.
It is known that the operating time of contactors to failure based on the aging parameters of the coil insulation can be described by the Rayleigh distribution function with the parameter S = 260 thousand km.

Required.
For an operating time of 120 thousand km. it is necessary to determine the probability of failure-free operation, the failure rate and the average time until the first failure of the electromagnetic contactor coil.

3.1 Basic connection of elements

A system consisting of several independent elements connected functionally in such a way that the failure of any of them causes a system failure is represented by a design block diagram of failure-free operation with sequentially connected events of failure-free operation of the elements.

Initial data.
The non-redundant system consists of 5 elements. Their failure rates are respectively equal to 0.00007; 0.00005; 0.00004; 0.00006; 0.00004 h-1

Required.
It is necessary to determine system reliability indicators: failure rate, mean time to failure, probability of failure-free operation, failure rate. Reliability indicators P(l) and a(l) are obtained in the range from 0 to 1000 hours in increments of 100 hours.

Let us calculate the failure rate and average time to failure using the following equations:

We obtain the values ​​of the probability of failure-free operation and the failure rate using equations reduced to the form:

Calculation results P(l) And a(l) in the interval from 0 to 1000 hours of operation, we present it in the form of a table. 3.1.

Table 3.1.

Results of calculating the probability of failure-free operation and the frequency of system failures over the time interval from 0 to 1000 hours.
l, hour P(l) a(l), hour -1
0 1 0,00026
100 0,974355 0,000253
200 0,949329 0,000247
300 0,924964 0,00024
400 0,901225 0,000234
500 0,878095 0,000228
600 0,855559 0,000222
700 0,833601 0,000217
800 0,812207 0,000211
900 0,791362 0,000206
1000 0,771052 0,0002

Graphic illustration P(l) And a(l) in the section up to the average time to failure is shown in Fig. 3.1, 3.2.

Rice. 3.1. Probability of failure-free operation of the system.

Rice. 3.2. System failure rate.

3.2 Redundant connection of elements

Initial data.
In Fig. Figures 3.3 and 3.4 show two structural diagrams of connecting elements: general (Fig. 3.3) and element-by-element redundancy (Fig. 3.4). The probabilities of failure-free operation of the elements are respectively equal to P1(l) = P '1(l) = 0.95; P2(l) = P’2(l) = 0.9; P3(l) = P '3(l) = 0.85.

Rice. 3.3. Diagram of a system with general redundancy.

Rice. 3.4. Scheme of a system with element-by-element redundancy.

We calculate the probability of failure-free operation of a block of three elements without redundancy using the expression:

The probability of failure-free operation of the same system with general redundancy (Fig. 3.3) will be:

The probabilities of failure-free operation of each of the three blocks with element-by-element redundancy (Fig. 3.4) will be equal:

The probability of failure-free operation of the system with element-by-element redundancy will be:

Thus, element-by-element redundancy provides a more significant increase in reliability (the probability of failure-free operation increased from 0.925 to 0.965, i.e. by 4%).

Initial data.
In Fig. 3.5 shows a system with a combined connection of elements. In this case, the probabilities of failure-free operation of the elements have the following values: P1=0.8; P2=0.9; P3=0.95; Р4=0.97.

Required.
It is necessary to determine the reliability of the system. It is also necessary to determine the reliability of the same system, provided that there are no backup elements.

Fig.3.5. System diagram with combined operation of elements.

For calculations in the source system, it is necessary to select the main blocks. There are three of them in the presented system (Fig. 3.6). Next, we will calculate the reliability of each block separately, and then find the reliability of the entire system.

Rice. 3.6. Interlocked scheme.

The reliability of the system without redundancy will be:

Thus, a system without redundancy is 28% less reliable than a system with redundancy.

Availability

LECTURE No. 14. Ensuring accessibility

The information system provides its users with a certain set of services. They say that the required level of availability of these services is ensured if the following indicators are within specified limits:

  • Service efficiency. The efficiency of a service is determined in terms of the maximum time to service a request, the number of supported users, etc. It is required that the efficiency does not fall below a predetermined threshold.
  • Unavailability time. If the effectiveness of an information service does not satisfy the imposed restrictions, the service is considered unavailable. It is required that the maximum duration of the unavailability period and the total unavailability time for a certain period (month, year) do not exceed predetermined limits.

In essence, it is required that the information system operates at the required efficiency almost always. For some critical systems (for example, control systems), the unavailability time should be zero, without any “almost”. In this case, they talk about the probability of an unavailability situation occurring and require that this probability does not exceed a given value. To solve this problem, special fault-tolerant systems have been created and are being created, the cost of which, as a rule, is very high.

The vast majority of commercial systems have less stringent requirements, but modern business life imposes quite severe restrictions here, when the number of users served can be measured in the thousands, response time should not exceed a few seconds, and unavailability time should not exceed several hours per year.

The problem of ensuring high availability must be solved for modern configurations built in technologies client/server. This means that the entire chain needs protection - from users (possibly remote) to critical servers (including security servers).

The main threats to accessibility were discussed earlier.

In accordance with GOST 27.002, a failure is understood as an event that involves a malfunction of the product. In the context of this work, a product is an information system or its component.

In the simplest case, we can assume that failures of any component of a composite product lead to an overall failure, and the distribution of failures over time is a simple Poisson flow of events. In this case, the concept of failure rate and mean time between failures is introduced, which are related to each other by the relation

where is the component number,


– failure rate,

– mean time between failures.

The failure rates of independent components add up:

and the mean time between failures for a composite product is given by the relation

Already these simple calculations show that if there is a component whose failure rate is much greater than that of the others, then it is this component that determines the mean time between failures of the entire information system. This is a theoretical justification for the principle of strengthening the weakest link first.

The Poisson model allows us to substantiate another very important point, namely that an empirical approach to building high availability systems cannot be implemented in an acceptable time. In a traditional software system testing/debugging cycle, optimistically, each bug fix leads to an exponential decrease (by about half a decimal order) in the failure rate. It follows that in order to verify experimentally that the required level of availability has been achieved, regardless of the testing and debugging technology used, you will have to spend time almost equal to the mean time between failures. For example, to achieve a mean time between failures of 10 5 hours, it will take more than 10 4.5 hours, which is more than three years. This means that we need other methods for building high availability systems, methods whose effectiveness has been proven analytically or practically over more than fifty years of development of computer technology and programming.

The Poisson model is applicable in cases where the information system contains single points of failure, that is, components whose failure leads to the failure of the entire system. A different formalism is used to study redundant systems.

In accordance with the statement of the problem, we will assume that there is a quantitative measure of the effectiveness of the information services provided by the product. In this case, the concepts of performance indicators of individual elements and the effectiveness of the functioning of the entire complex system are introduced.

As a measure of availability, we can take the probability of acceptability of the effectiveness of the services provided by the information system over the entire period of time under consideration. The greater the efficiency margin the system has, the higher its availability.

If there is redundancy in the system configuration, the probability that during the considered period of time the efficiency of information services will not fall below the permissible limit depends not only on the probability of component failure, but also on the time during which they remain inoperative, since in this case the overall efficiency decreases, and each subsequent failure can be fatal. To maximize system availability, it is necessary to minimize the downtime of each component. In addition, it should be taken into account that, in general, repair work may require a reduction in efficiency or even temporary shutdown of functional components; this kind of influence also needs to be minimized.

A few terminological notes. Usually in the literature on reliability theory, instead of availability, they talk about availability (including high availability). We preferred the term "availability" to emphasize that information service must not only be “ready” in itself, but accessible to its users in conditions where situations of inaccessibility may be caused by reasons that at first glance are not directly related to service(example: lack of consulting services).

Further, instead of unavailability time, they usually talk about the availability factor. We wanted to pay attention to two indicators - the duration of a single downtime and the total duration of downtime, so we preferred the term “downtime” as it is more capacious.

When considering reliability issues, it is often convenient to imagine the matter as if the element were subject to failure rate with some intensity l(t); the element fails the moment the first event of this thread occurs.

The image of a “failure flow” takes on real meaning if the failed element is immediately replaced with a new one (restored). The sequence of random moments in time at which failures occur (Fig. 3.10) represents a certain flow of events, and the intervals between events are independent random variables distributed according to the corresponding distribution law.

The concept of “failure rate” can be introduced for any reliability law with density f(t); in the general case, the failure rate l will be a variable value.

Intensity(or otherwise “danger”) of failures is the ratio of the distribution density of the time of failure-free operation of an element to its reliability:

Let us explain the physical meaning of this characteristic. Let a large number N of homogeneous elements be tested simultaneously, each until it fails. Let us denote n(t) as the number of elements that turned out to be serviceable at time t, and m(t, t+Dt), as before, as the number of elements that failed in a short period of time (t, t+Dt). There will be an average number of failures per unit of time

Let us divide this value not by the total number of tested elements N, but by number of serviceable by time t elements n(t). It is easy to verify that for large N the ratio will be approximately equal to the failure rate l (t):

Indeed, for large N n(t)»Np(t)

But according to formula (3.4),

In reliability studies, approximate expression (3.8) is often considered as a determination of the failure rate, i.e. it is defined as average number of failures per unit of time per one working element.

The characteristic l(t) can be given one more interpretation: it is conditional probability density of an element failure at a given time t, provided that before moment t it worked without failure. Indeed, consider the probability element l(t)dt - the probability that during time (t, t+dt) the element will move from the “working” state to the “not working” state, provided that it was working before moment t. In fact, the unconditional probability of failure of an element in the section (t, t+dt) is equal to f(t)dt. This is the probability of combining two events:

A - the element worked properly until moment t;

B - element failed at time interval (t, t+dt).

According to the rule of probability multiplication: f(t)dt = P(AB) = P(A) P(B/A).



Considering that P(A)=p(t), we get: ;

and the value l(t) is nothing more than the conditional probability density of the transition from the “working” state to the “failed” state for moment t.

If the failure rate l(t) is known, then the reliability p(t) can be expressed through it. Taking into account that f(t)=-p"(t), we write formula (3.7) in the form:

Integrating, we get: ,

Thus, reliability is expressed through the failure rate.

In the special case when l(t)=l=const, formula (3.9) gives:

p(t)=e - l t , (3.10)

those. the so-called exponential reliability law.

Using the image of a “failure flow”, one can interpret not only formula (3.10), but also a more general formula (3.9). Let us imagine (quite conventionally!) that an element with an arbitrary reliability law p(t) is subject to a flow of failures with a variable intensity l(t). Then formula (3.9) for p(t) expresses the probability that more than one failure will not appear in the time interval (0, t).

Thus, both with the exponential and with any other law of reliability, the operation of the element, starting from the moment of switching on t = 0, can be imagined in such a way that the Poisson failure law acts on the element; for an exponential reliability law, this flow will be with a constant intensity l, and for a non-exponential one, with a variable intensity l(t).

Note that this image is only suitable if the failed element not replaced with a new one. If, as we did before, we immediately replace the failed element with a new one, the failure flow will no longer be Poisson. Indeed, its intensity will depend not just on the time t that has passed since the beginning of the entire process, but also on the time t that has passed since the random moment of inclusion precisely given element; This means that the flow of events has a consequence and is not Poisson.

If, throughout the entire process under study, this element is not replaced and can fail no more than once, then when describing a process that depends on its functioning, one can use the scheme of a Markov random process. but at a variable, rather than a constant, failure rate.

If the non-exponential reliability law differs relatively little from the exponential one, then, for the sake of simplification, it can be approximately replaced by an exponential one (Fig. 3.11).

The parameter l of this law is chosen so as to keep unchanged the mathematical expectation of the failure-free operation time, equal, as we know, to the area limited by the curve p(t) and the coordinate axes. To do this, you need to set the parameter l of the exponential law equal to

where is the area limited by the reliability curve p(t). Thus, if we want to characterize the reliability of an element by a certain average failure rate, we need to take as this intensity the value inverse to the average failure-free operation time of the element.

Above we defined the quantity as the area limited by the curve p(t). However, if you need to know only average uptime of an element, it is easier to find it directly from statistical material as arithmetic mean all observed values ​​of the random variable T - the operating time of the element before its failure. This method can also be applied in the case where the number of experiments is small and does not allow one to construct the p(t) curve accurately enough.

Example 1. The reliability of the element p(t) decreases over time according to a linear law (Fig. 3.12). Find the failure rate l(t) and the average failure-free operation time of the element.

Solution. According to formula (3.7) in the section (0, t o) we have:

According to the given reliability law

(0

The second integral here is equal to .

As for the first, it is calculated approximately (numerically): ,

whence » 0.37+0.135=0.505.

Example 3. The distribution density of the element's failure-free operation time is constant in the section (t 0, t 1) and is equal to zero outside this section (Fig. 3.16). Find the failure rate l(t).

Solution. We have: , (t o

The failure rate graph is shown in Fig. 3.17; at t® t 1, l(t)® ¥ .

Failure rate is the ratio of the number of failed samples of equipment per unit of time to the average number of samples that work properly in a given period of time, provided that the failed samples are not restored or replaced with serviceable ones.

This characteristic is designated .According to definition

where n(t) is the number of failed samples in the time interval from to ; – time interval, - average number of properly working samples in the interval ; N i is the number of properly working samples at the beginning of the interval, N i +1 is the number of properly working samples at the end of the interval.

Expression (1.20) is a statistical determination of the failure rate. To provide a probabilistic representation of this characteristic, we will establish a relationship between the failure rate, the probability of failure-free operation and the failure rate.

Let us substitute into expression (1.20) the expression for n(t) from formulas (1.11) and (1.12). Then we get:

.

Taking into account expression (1.3) and the fact that N av = N 0 – n(t), we find:

.

Aiming towards zero and passing to the limit, we get:

. (1.21)

Integrating expression (1.21), we obtain:

Since , then based on expression (1.21) we obtain:

. (1.24)

Expressions (1.22) – (1.24) establish the relationship between the probability of failure-free operation, the frequency of failures and the failure rate.


Expression (1.23) can be a probabilistic determination of the failure rate.

Failure rate as a quantitative characteristic of reliability has a number of advantages. It is a function of time and allows one to clearly establish characteristic areas of equipment operation. This can significantly improve the reliability of the equipment. Indeed, if the running-in time (t 1) and the end of work time (t 2) are known, then it is possible to reasonably set the time for training the equipment before the start of its operation.

operation and its service life before repair. This allows you to reduce the number of failures during operation, i.e. ultimately leads to increased equipment reliability.

The failure rate as a quantitative characteristic of reliability has the same drawback as the failure rate: it allows one to fairly simply characterize the reliability of equipment only up to the first failure. Therefore, it is a convenient characteristic of the reliability of disposable systems and, in particular, the simplest elements.

Based on the known characteristic, the remaining quantitative characteristics of reliability are most easily determined.

The indicated properties of the failure rate allow it to be considered the main quantitative characteristic of the reliability of the simplest elements of radio electronics.

Share