The main Haystack function, for higher-dimensional spaces and continuous expression levels.

haystack_continuous_highD(
  x,
  expression,
  grid.points = 100,
  weights.advanced.Q = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid",
  randomization.count = 100,
  n.genes.to.randomize = 100,
  selection.method.genes.to.randomize = "heavytails",
  grid.coord = NULL,
  genes.to.randomize = NULL,
  spline.method = "ns"
)

Arguments

x: Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.
expression: a matrix with expression data of genes (rows) in cells (columns)
grid.points: An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100.
weights.advanced.Q: (Default: NULL) Optional weights of cells for calculating a weighted distribution of expression.
dir.randomization: If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory.
scale: Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1.
grid.method: The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding".
randomization.count: Number of randomizations to use. Default: 100
n.genes.to.randomize: Number of genes to use in randomizations. Default: 100
selection.method.genes.to.randomize: Method used to select genes for randomization.
grid.coord: matrix of grid coordinates.
spline.method: Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines.

Value

An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.

Examples

# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)
#> ### calling haystack_continuous_highD()...
#> ### Using package sparseMatrixStats to speed up statistics in sparse matrices.
#> ### Calculating row-wise mean and SD... 
#> ### Filtered 0 genes with zero variance...
#> ### Using 100 randomizations...
#> ### Using 100 genes to randomize...
#> Warning: The value of 'grid.points' appears to be very high (> No. of cells / 10). You can set the number of grid points using the 'grid.points' parameter.
#> ### scaling input data...
#> ### deciding grid points...
#> ### calculating Kullback-Leibler divergences...
#> ### performing randomizations...
#> ### estimating p-values...
#> ### picking model for mean D_KL...
#> ### using natural splines
#> ### best RMSD  : 0.09
#> ### best df    : 3
#> ### picking model for stdev D_KL...
#> ### using natural splines
#> ### best RMSD  : 0.018
#> ### best df    : 5
#> ### returning result...
# list top 10 biased genes
show_result_haystack(res, n=10)
#>              D_KL log.p.vals log.p.adj
#> gene_79  2.405028  -38.82043 -36.12146
#> gene_242 1.850754  -38.06052 -35.36155
#> gene_339 1.940697  -37.15175 -34.45278
#> gene_71  2.673197  -36.08386 -33.38489
#> gene_275 1.841230  -35.32151 -32.62254
#> gene_62  2.175734  -34.89847 -32.19950
#> gene_351 1.906705  -33.77042 -31.07145
#> gene_479 2.455483  -33.13287 -30.43390
#> gene_300 2.230951  -32.52041 -29.82144
#> gene_429 1.705867  -31.71053 -29.01156