seriograph: a graphic seriation tool for count tables

SPARTAAS | seriograph
SPARTAAS [Bellanger,Coulon,Husi]

Introduction

In order to facilitate the exploitation of the data tables, we propose here a computerised graphic processing tool (EPPM serigraph - for Ecart Positif aux Pourcentages Moyens - positive deviation from the average percentage), which does not require specialised statistical skills and is adapted to the case of stratified sites, where the study of the evolution of artefacts can be based on the relative chronology provided by the excavation.

The treatment consists firstly of transforming this table of counts into a table of percentages, the total number in each set (each row) being reduced to 100; these are the proportions, or frequencies, of the types in the sets are thus compared.

The display of positive deviations from the mean percentages (EPPM) shows in black on a lighter background the percentage shares that are higher than the mean percentage of the variable, so as to highlight the most significant part of the values in the table.This display is simply adapted to the seriograph: when a percentage is greater than the average percentage of the type, the excess share (called here EPPM: positive deviation from the average percentage) is shown in black, centred around the axis of the type, on the grey background of the percentage bar.

The table is then transformed into a graphic matrix where these percentages are expressed, for each type, by horizontal bars centred on the column axis. When the rows are ordered chronologically, the silhouette formed by the superposition of these frequency bars bars makes it possible to visualise the evolution over time of the type concerned.

The display of the percentages allows comparison of the different sets but does not provide information on the differences in numbers. To fill this gap, the proportion of the numbers in each class is displayed on the seriograph (weight column).

The processing technique applies to sets whose chronological order is not known; the lines of the graph are to be reorganised so as to obtain boat-shaped silhouettes following the hypothesis of a chronological evolution corresponding to the seriation model.



Seriograph




Authors:


L. Bellanger

mail: <lise.bellanger@univ-nantes.fr>

P. Husi

mail: <philippe.husi@univ-tours.fr>

A. Coulon

Maintainer:


A. Coulon

mail: <arthur-coulon@outlook.fr>

Get started with the hclustcompro application

Table of content


The outputs


Seriograph

We have chosen the serigraph (Desachy 2004). This tool makes it possible to highlight the evolution of ceramics over time as well as to understand the commercial relations thanks to the imported ceramics. The percentages of each category of ceramics per set are displayed. The percentages are calculated independently for each set (row). The display of the percentages allows comparison of the different sets but does not provide information on the differences in numbers. To fill this gap, the proportion of the numbers in each class is displayed on the seriograph (weight column).

The seriograph can be used in a strictly deductive way, by blocking the order of the rows (archaeological contexts) when their succession is known, and by reordering only the columns, in order to examine the chronological behaviour of the variables, and thus to see in particular what does not obey a serial evolution: cyclical phenomena, or the problems of intrusion and residuality.

In order to facilitate the exploitation of the data tables, we propose here a computerised graphic processing tool (EPPM serigraph - for Ecart Positif aux Pourcentages Moyens - positive deviation from the average percentage), which does not require specialised statistical skills and is adapted to the case of stratified sites, where the study of the evolution of artefacts can be based on the relative chronology provided by the excavation.

The treatment consists firstly of transforming this table of counts into a table of percentages, the total number in each set (each row) being reduced to 100; these are the proportions, or frequencies, of the types in the sets are thus compared.

The display of positive deviations from the mean percentages (EPPM) shows in black on a lighter background the percentage shares that are higher than the mean percentage of the variable, so as to highlight the most significant part of the values in the table.This display is simply adapted to the seriograph: when a percentage is greater than the average percentage of the type, the excess share (called here EPPM: positive deviation from the average percentage) is shown in black, centred around the axis of the type, on the grey background of the percentage bar.

The table is then transformed into a graphic matrix where these percentages are expressed, for each type, by horizontal bars centred on the column axis. When the rows are ordered chronologically, the silhouette formed by the superposition of these frequency bars bars makes it possible to visualise the evolution over time of the type concerned.

The display of the percentages allows comparison of the different sets but does not provide information on the differences in numbers. To fill this gap, the proportion of the numbers in each class is displayed on the seriograph (weight column).

The processing technique applies to sets whose chronological order is not known; the lines of the graph are to be reorganised so as to obtain boat-shaped silhouettes following the hypothesis of a chronological evolution corresponding to the seriation model.


The inputs


Seriation

This input allows you to enable or disable the seriation of the columns. The matrix permutation uses an algorithm called 'reciprocal averages'. Each row is assigned a rank from 1 to n, the number of rows. For each column, a barycentre is calculated by weighting according to the row rank. Finally, the columns are reorganised by sorting them according to their barycentre.

Visualization

This input allows you to select the element to be plotted. There are tree options: plot the positive deviation from the average percentage (EPPM in French), plot the frequency or plot both. The average percentage is calculated for each category (columns) on the total number of accounts (all classes combined). From the average percentage, we obtain for each category and for each row the difference between the percentage of the category in the class and the average percentage. The EPPM corresponds to the notion of independence deviation (between rows and columns, between categories and time classes) in a chi-square test approach. Although this approach is fundamental in statistical analysis, the independence deviations here are purely indicative and are not associated with a p_value that could determine the significance of the deviations.

Sort periods

The rows are initially in the order of the data table. It is possible to reorder the rows in a temporal way. In the interface you can drag and drop the row (green) to change the order.

Import your data

You can import your data. You will need to upload a csv for the contingency table.

The data is in the form of a count table with the contexts in rows and the categories (GT) in columns.

The settings allow you to import different data frame organisation (header, column separator, ...).

Yes or no option. Do you have headers on your columns?

Rownames

Yes or no option. Do you have row names on your rows?

Separator

Select the character you want to use to separate the columns.

Quote

Select the quotation marks to use on strings.

Decimal

Select the character to use to indicate the decimal point.

CSV Format and write.table

It is a data.frame with colunms separated by semicolons ";".

The input format for importing data is the .csv format, but also supports the .txt format as a .csv file.

In R, you can export your data frame to a csv file using write.csv2 or write.table. In a csv you can choose a character to separate the columns. In the same way, you can define the character to indicate the decimal point.

write.table(data,file="path/to/name_file.csv",sep=";",dec=".",row.names=FALSE,quote=FALSE)

In Excel you can save in csv format in order to import your data frame.

The import interface allows you to set these values using the 'header', 'decimal', 'separator' and 'quote' options.


References





Desachy B. (2004). Le sériographe EPPM : un outil informatisé de sériation graphique pour tableaux de comptages. In: Revue archéologique de Picardie, n°3-4, 2004. Céramiques domestiques et terres cuites architecturales. Actes des journées d'étude d'Amiens (2001-2002-2003) pp. 39-56 ⟨doi:10.3406/pica.2004.2396⟩.