The main Haystack function, for higher-dimensional spaces and continuous expression levels.

haystack_continuous_highD(
  x,
  expression,
  grid.points = 100,
  weights.advanced.Q = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid",
  randomization.count = 100,
  n.genes.to.randomize = 100,
  selection.method.genes.to.randomize = "heavytails",
  grid.coord = NULL,
  genes.to.randomize = NULL,
  spline.method = "ns"
)

Arguments

x

Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.

expression

a matrix with expression data of genes (rows) in cells (columns)

grid.points

An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100.

weights.advanced.Q

(Default: NULL) Optional weights of cells for calculating a weighted distribution of expression.

dir.randomization

If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory.

scale

Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1.

grid.method

The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding".

randomization.count

Number of randomizations to use. Default: 100

n.genes.to.randomize

Number of genes to use in randomizations. Default: 100

selection.method.genes.to.randomize

Method used to select genes for randomization.

grid.coord

matrix of grid coordinates.

spline.method

Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines.

Value

An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.

Examples

# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)
#> ### calling haystack_continuous_highD()...
#> ### Using package sparseMatrixStats to speed up statistics in sparse matrices.
#> ### Calculating row-wise mean and SD... 
#> ### Filtered 0 genes with zero variance...
#> ### Using 100 randomizations...
#> ### Using 100 genes to randomize...
#> Warning: The value of 'grid.points' appears to be very high (> No. of cells / 10). You can set the number of grid points using the 'grid.points' parameter.
#> ### scaling input data...
#> ### deciding grid points...
#> ### calculating Kullback-Leibler divergences...
#> ### performing randomizations...
#> ### estimating p-values...
#> ### picking model for mean D_KL...
#> ### using natural splines
#> ### best RMSD  : 0.09
#> ### best df    : 3
#> ### picking model for stdev D_KL...
#> ### using natural splines
#> ### best RMSD  : 0.018
#> ### best df    : 5
#> ### returning result...
# list top 10 biased genes
show_result_haystack(res, n=10)
#>              D_KL log.p.vals log.p.adj
#> gene_79  2.405028  -38.82043 -36.12146
#> gene_242 1.850754  -38.06052 -35.36155
#> gene_339 1.940697  -37.15175 -34.45278
#> gene_71  2.673197  -36.08386 -33.38489
#> gene_275 1.841230  -35.32151 -32.62254
#> gene_62  2.175734  -34.89847 -32.19950
#> gene_351 1.906705  -33.77042 -31.07145
#> gene_479 2.455483  -33.13287 -30.43390
#> gene_300 2.230951  -32.52041 -29.82144
#> gene_429 1.705867  -31.71053 -29.01156