# Issues

## Gamma Software issues

We observed early on in pygamma-agreement development that we weren’t able to perfectly match [mathet2015]’s results from their closed-source Java implementation (the “Gamma Software”). In an effort to understand these discrepancies between our implementation of the gamma measure and theirs, we decompiled their application and carefully studied its code. This allowed us to find a number of small (yet significant) implementation details that were either undocumented or arbitrary.

In this section, we list off all the details that might make our calculation of the gamma-agreement deviate from the Java implementation’s calculation, and explain what our own implementation choice is.

Warning

What we call “undocumented” are choices of implementation found in the Gamma Software that are not mentionned or explained in [mathet2015] or [mathet2018]. We made the choice of replicating some of those in pygamma-agreement, and not others,

### 1. Average number of annotations per annotator

In [mathet2015], section 4.3, a value is defined as such:

“let \(\bar{x}={\frac{\sum_{i=1}^{n}x_i}{n}}\) be the average number of annotations per annotator”

This value is involved in the computation of the disorder of an alignment.

**In the Gamma Software:**
an int-instead-of-float division transforms this value into
\(\bar{x}=\lfloor{\frac{\sum_{i=1}^{n}x_i}{n}}\rfloor\).

**In pygamma-agreement:**
We chose not to replicate this small discrepancy as it seemed like a bug, and didn’t
weight too much on the value of the gamma agreement.

Although it has no influence over which alignment will be considered the best alignment, it slighly changes the value of the disorders, which tweaks the gamma agreement for small continua.

### 2. Minimal distance between pivots

In [mathet2015], section 5.2.1, *Mathet et Al.* explain their method of sampling continuua by shuffling the reference
continuum using random shift positions; and they specify a constraint on those positions :

“To limit this phenomenon, we do not allow the distance between two shifts to be less than the average length of units.”

**In the Gamma Software:**
The value used for this minimal distance is actually **half** the average length of units.

**In pygamma-agreement:**
We decided to include this discrepancy in the ShuffleContinuumSampler as it is designed to
mimic the java implementation’s, as opposed to our StatisticalContinuumSampler used by default by `pygamma-agreement`

.

### 3. Pairing confidence

In [mathet2018], section 4.2.3, the pairing confidence of a pair of annotations is defined as such:

“for \(pair_i = (u_j, u_k)\), \(p_i = max(0, 1 - d_{pos}(u_j, u_k))\)”

**In the Gamma Software:**
Their implementation of this formula uses a combined dissimilarity
\(d_{\alpha, \beta} = \alpha d_{pos} + \beta d_{cat}\), which transforms the formula for the pairing confidence this
way: “\(pair_i = (u_j, u_k)\), \(p_i = max(0, 1 - \alpha \times d_{pos}(u_j, u_k))\)”.

**In pygamma-agreement:**
Although it looked a lot like a bug, ignoring it makes the values of gamma-cat/k too different from those
of the gamma software. We chose to include the alpha factor, as setting it to 1.0 can remove the discrepancy :

```
dissimilarity = CombinedCategoricalDissimilarity(alpha=3.0, # Set any alpha value you want
beta=2.0,
delta_empty=1.0)
gamma_results = continuum.compute_gamma(dissimilarity)
dissimilarity.alpha = 1.0 # gamma_results stores the dissimilarity used for computing the
# best alignments, as it is needed for computing gamma-cat
print(f"gamma-cat is {gamma_results.gamma_cat}") # Gamma-k can also be influenced by alpha
dissimilarity.alpha = 3.0 # Add this line if you want to reuse the dissimilarity with alpha = 3
```

### 4. Best alignment

The Mixed Integer Programming solvers used in pygamma-agreement not being the same as the one used by the Gamma-Software, it is possible that the best alignments found by both software are different if multiple best alignments with the same disorder exist.

**In the Gamma Software:**
The MIP solver used is `liblpsolve`

**In pygamma-agreement:**
The MIP solver used is `GLPK`

, or the faster `CBC`

if it is installed.

Although this doesn’t weight on the value of gamma, it slightly does on gamma-cat and gamma-k’s. Thus, there is no way to obtain for sure the same results as the Gamma Software for gamma-cat/k.

## How to obtain the results from the Gamma Software

This part explains how one can obtain an *almost* similar output as the Gamma Software using `pygamma-agreement`

.
The two main differences being :

### Sampler

The sampler `pygamma-agreement`

uses by default is **not** the one described in [mathet2015]. Our sampler collects
statistical data about the input continuum (averages / standard deviation of several values such as length of
annotations), used then to generate the samples. We made this choice because we felt that their sampler, which simply
re-shuffles the input continuum, was unconvincing for the need of ‘true’ randomness.

To re-activate their sampler, you can use the `--mathet-sampler`

(or `-m`

) option when using the command line, or
manually set the sampler used for computing the gamma agreement in python :

```
from pygamma_agreement import ShuffleContinuumSampler
...
gamma_results = continuum.compute_gamma(sampler=ShuffleContinuumSampler(),
precision_level=0.01)
```

### Alpha value

The Gamma Software uses \(\alpha=3\) in the combined categorical dissimilarity.

To set it in the command line interface, simply use the `--alpha 3`

(or `-a 3`

) option.
In python, you need to manually create the combined categorical dissimilarity with the `alpha=3`

parameter.

```
dissim = CombinedCategoricalDissimilarity(alpha=3)
gamma_results = continuum.compute_gamma(dissim,
sampler=ShuffleContinuumSampler(),
precision_level=0.01)
```

## Bugs in former versions of pygamma-agreement

This section adresses fatal errors in release 0.1.6 of `pygamma-agreement`

, whose consequences were a wrong
output for gamma or other values. Those have been fixed in version 1.0.0.

### 1. Average number of annotations per annotator

In [mathet2015], section 4.3, a value is defined as such:

“let \(\bar{x}={\frac{\sum_{i=1}^{n}x_i}{n}}\) be the average number of annotations per annotator”

A misreading made us interpret this value as the **total number of annotations** in the continuum. Thus, the values
calculated by `pygamma-agreement`

were strongly impacted (a difference as big as *0.2* for small continua).

### 2. Minimal distance between pivots

In [mathet2015], section 5.2.1, *Mathet et Al.* explain their method of sampling continuua by shuffling the reference
continuum using random shift positions; and they specify a constraint on those positions :

“To limit this phenomenon, we do not allow the distance between two shifts to be less than the average length of units.”

In the previous version of the library, we overlooked this specificity of the sampling algorithm, which made the gamma values slightly bigger than expected (even after correction of the previous, far more impactful error).