We have performed preliminary analysis of satellite monomer family sets from the PlantSat database (Macas et al., 2002) and GenBank database. Our goal was to infer the primary features relevant for a nucleosome phasing grid along DNA tandem arrays
Upon scrutinizing the DNA curvature profile sets built for the 974 PlantSat database we observed that the significant part of the monomers reveals a contrasting profile with sharp peaks and troughs. Along with previously reported evidence of the conspicuous amplitude for a range of functional elements, such as transcription start sites, exons, poly-A signals indicating the degree of nucleosome positioning precision, we concluded that this rise and fall feature is important for nucleosome positioning along tandem repeat stretches. We explored the effect in more detail. Firstly, we performed simulations of the maximum amplitude variance based on the randomly generated sequences with dinucleotide content congruent to the original PlantSat monomer entries. We observed that the maximum amplitude variance exponentially depends on the monomer length, as was previously deduced for the time series model and extreme value statistics in relation to the DNA sequences (Karlin , Altschul, 1990). In particular, The random variable M(n) (the centered maximal segment score) has the close approximating distribution: Prob{M(n) > x} 1 – exp{-K*e-A*x}.
While the regression lines in both simulated and original data were the same, the residual squares(R2) differ drastically. Fisher’s F test also rejected the homogeneity of real and simulated data with p<1e-5, thus implying the existence of non-random impact in real data.
For the monomers with significant maximum amplitude value we observed mostly U-shaped binding affinity profiles. As long as the regular period of nucleosome binding profile usually equals to monomer length, except for certain cases, when there are minor peaks with a period usually equal to 140-160bp, the “proper” junction of monomers implies the affinity score at the beginning of monomer should be approximately equal to the affinity score at the end.
There are some cases of reverse U-shaped DNA curvature profiles, but their frequency is significantly less than U-shaped ones. The cases of non-symmetric monomers, with the binding profile values either monotonously increasing or decreasing are exceedingly rare. The cases of monomers with irregular or non-significant nucleosome affinity trend usually results in quite low value of maximum amplitude. These monomers probably don’t provide the nucleosome binding grid at all.
We calculated the periodograms based on Fast Fourier Transform for the range of the satellite tandem arrays We observed, that, apart from the peak dictated by the length of the monomer, there is a sharp peak approximately corresponding to the nucleosome binding unit (150-180bp including the linker sequence).