Ru
Search

Chromosome selection

When a chromosome choice was required the integral estimation of the data on each chromosome which is accessible in global databases, such as Entrez Gene, RefSeq, UniProt was used.

In the analysis the following parameters were used:
  • a number of coded genes,
  • a number of coded single nucleotide polymorphism (SNP),
  • a number of the genes related to the disease, e.g. mentioned together with the disease name in the single abstract or publication.

Basic parameters for the chromosome selection included the lowest number of coding genes and the maximum relations between chromosome genes and socially important diseases. According to the list of socially important diseases for Russia, selected diseases were as follows: cancer, atherosclerosis, asthma and psoriasis. Thus, the information on number of coding genes and number of the genes related to these diseases for each human chromosome had been reported. Linkage between a chromosome and occurrence and development of socially important diseases was estimated by two approaches:

  1. A frequency of chromosome coded SNPs associated to disease development was estimated;
  2. a domestic technology of automated texts analysis to find such association was used.

A number of coded SNPs related to diseases was taken from the UniProt protein database. Other approach is based on the frequency of the joint mention of gene name and disease in the single abstract of a publication. The analysis had been performed in accordance to GeneRIF resource which includes the summary of a gene in terms of corresponding phrases from scientific papers. Thus, for each chromosome, a number of genes that were mentioned together with the disease in the single abstract had been calculated. It was shown that it was difficult to find a clear relation between the chromosome and any disease. Nevertheless, according to our estimation, 18th chromosome most closely corresponds to the formulated criteria.

In the context of relation to (a) socially important diseases and to (b) the number of identified plasma proteins, all chromosomes are virtually the same.