Structure

1. Vocabulary structure

1.1. General values
Number of items (N): 2,234,436;

Number of lemmas (V): 26,337;
Number of senses: 36,064;

Repetition rate (N/V): 84.84
Type/Token Ratio (Carrol CTTR: V/√2N): 12.45;
Number of neological lemmas: 5,665;
Repetition rate of neologisms: 5.06;
Very high frequency lemmas (fabs > 1700; frel > 1/1000): 116; (with 1,103,67 items and  reading percentage = 62.86%);

Higy frequency lemmas (fabs > 300; frel > 0.17/1000): 445;
Medium frequency lemmas (fabs > 30; frel > 0.015/1000): 2,866;
Low-frequency lemmas ((fabs < 30 i >1): 11,836;
Hapax (H): 8,996;
%H of total V:
37.08%;
Maximum absolute frequency: 141,811
Maximum relative frequency: 80.81/l.000.
 

1.2. Values according to the textual typology
A Texts:

Number of items (Na): 443,728;
Number of lemmas (Va): 9,049;
Number of senses: 11,398;

Repetition rate (Na/Va): 48.75;
Number of exclusive words: 5,029 (= 55.57%);
Repetition rate of exclusive words: 6.57;
Number of neological lemmas: 1,313;
Repetition rate of neological lemmas: 5.75;

Type/Token Ratio (Carrol CTTR: V/√2N): 9.63;

B Texts:

Number of items (Nb): 455,633;
Number of lemmas (Vb): 10,120;
Number of senses: 15,012;
Repetition rate (Nb/Vb): 43.07;
Number of exclusive words: 3,910 (= 38.64%);
Repetition rate of exclusive words: 4.12;
Number of neological lemmas: 2,742;
​Repetition rate of neological lemmas: 3.13;

Type/Token Ratio (Carrol CTTR: V/√2N): 10.83;

C Texts:

Number of items (Nc): 456,967;
Number of lemmas (Vc): 10,389;
Number of senses: 14,917;
Repetition rate (Nc/Vc): 42.32;
Number of exclusive words: 4,163 (= 40.07%);
Repetition rate of exclusive words: 2.64;
Number of neological lemmas: 1,352;
​Repetition rate of neological lemmas: 4.23;

Type/Token Ratio (Carrol CTTR: V/√2N): 11.08;

D Texts:

Number of items (Nd): 439,797;
Number of lemmas (Vd): 9,807;
Number of senses: 14,250;
Repetition rate n (Nd/Vd): 44.69;
Number of exclusive words: 2,898 (= 30.17%);
Repetition rate of exclusive words: 2.24;
Number of neological lemmas: 2,162;
​Repetition rate of neological lemmas: 3.14;

Type/Token Ratioa (Carrol CTTR: V/√2N): 10.48;

E Texts:

Number of items (Nd): 438,312;
Number of lemmas (Vd): 9,807;
Number of senses: 14,250;
Repetition rate n (Nd/Vd): 44.69;
Number of exclusive words: 2,898 (= 30.17%);
Repetition rate of exclusive words: 2.24;
Number of neological lemmas: 2,162;
​Repetition rate of neological lemmas: 3.14;

Type/Token Ratioa (Carrol CTTR: V/√2N): 10.48;

 


1.3. Functional values

Nominal values (noun, adj., pron. and nom. loc.):

Number of senses: 16,189;
number of items: 681,148;  
repetition rate: 42.07;
reading percentage: 38.82%

Verbal values (verbs and verb. loc. ):

Number of senses: 5,639;
number of items: 295,437;  
repetition rate: 52.39;
reading percentage: 16.83%

Adverbial values (adv. pron. adv. and adv. loc. ):

Number of senses: 1,503;
number of items: 101,522;  
repetition rate: 67.54;
reading percentage: 5.78%

Article and connectors values (art., prep., prep. loc.,  conj., and conj. loc. ):

Number of senses: 589;
number of items: 617,515;  
repetition rate: 1,048.41;
reading percentage: 35.19%

Onomastic values (prop. noun):

Number of senses: 8,503;
number of items: 44,905;  
repetition rate: 5.28;
reading percentage: 2.56%

Words from other languages (Latin, Greek, Cataln, Arabic, Hebrew...), proverbial phrases and marginal elements:

Number of senses: 959;
number of items: 14,246;  
repetition rate: 14.85;
porcentaje de lectura: 0.82%

 

2. Database structure

The database of DiCCA-XV consists of several interrelated tables that permit querying from any of their fields. Therefore, in addition to consulting the dictionary by its words, the user interface allows complex searches based on various criteria:

a) grammatical: listings of adverbial phrases or verbal phrases, verbs with a particular regime or a specific prepositional or derivative inflectional morpheme ...

b) lexical and semantic: listings of terms of a given semantic domain or lists of verbs that select a particular argument or require terms with certain semantics in the complement.

c) etymological: latinisms, loanwords of Catalan or Italian, or fifteenth-century neologisms ...

d) onomastic: listings of specific geographical place names, or anthroponyms of certain area or population, or lists of mythological or literary terms...

e) textual: listings of all contexts of a given term, or collocations (in both anterior and posterior positions), or those that have a certain syntactic structure ...

The possibility of interbreeding these criteria allows the researcher to get a no end of outcomes. We will facilitate the research work of philologists and historians and generally of those people who are interested in a key period and a key geographic area for the configuration of modern Spanish.