Variations between advisors in measurement methods and variability in the interpretation of measurement results are two examples of sources of error variance in evaluation measures. Clear guidelines for reporting assessments are required for reliability in ambiguous or demanding measurement scenarios. A similar statistic, called pi, was proposed by Scott (1955). Cohen Kappa and Scotts Pi differ in how pe is calculated. Another way to conduct reliability tests is the use of the intraclass correlation coefficient (CCI).  There are several types, and one is defined as “the percentage of variance of an observation because of the variability between subjects in actual values.”  The ICC area can be between 0.0 and 1.0 (an early definition of CCI could be between 1 and 1). CCI will be high if there are few differences between the partitions that are given to each item by the advisors, z.B. if all advisors give values identical or similar to each of the elements. CCI is an improvement over Pearsons r`displaystyle r` and Spearmans `displaystyle `rho`, as it takes into account differences in evaluations for different segments, as well as the correlation between Denern. This is a simple procedure when the values are zero and one and the number of data collectors is two. If there are more data collectors, the procedure is a little more complex (Table 2).
However, as long as the values are limited to only two values, the calculation remains simple. The researcher calculates only the percentage agreement for each line and on average the lines. Another advantage of the matrix is that it allows the researcher to determine whether errors are accidental and are therefore fairly evenly distributed among all flows and variables, or whether a data collector often indicates different values from other data collectors. Table 2, which has an overall reliability of 90% for interraters, found that no data collector had an excessive number of outlier assessments (scores that did not agree with the majority of the evaluators` scores). Another advantage of this technique is that it allows the researcher to identify variables that can be problematic. Note that Table 2 shows that evaluators received only 60% approval for variable 10. This variable may warrant a review to determine the cause of such a low match in its assessment. On the other hand, if there are more than 12 codes, the expected Kappa value increment becomes flat.
As a result, the percentage of the agreement could serve the purpose of measuring the amount of the agreement. In addition, the increment of the sensitivity performance metric apartment values also reaches the asymptote of more than 12 codes.