API Reference (Python)¶
count_binary_matrices¶
- zagar.count_binary_matrices(p, q, sort_p_desc=False)¶
Count Binary Matrices with Row Sums \(\boldsymbol{p}\) and Column Sums \(\boldsymbol{q}\).
The underlying algorithm is a minor modification of the original EXACT algorithm implemented in Miller and Harrison (2013) The Annals of Statistics. Compared to their original C version, this version uses modern C++ optimizations (e.g.,
std::unordered_mapin C++ versuskhashfromklibfor hashing) and performs operations on flexible integer types, which may be more efficient when the problem size is not too large (e.g.,uint64_trather than arbitrary precision types for smaller input values).- Parameters:
p (list of int) – Row sums (nonnegative integers)
q (list of int) – Column sums (nonnegative integers)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: False)
- Returns:
{'number': int, 'memo_entries': int}number: The count of binary matrices (arbitrary precision)memo_entries: Number of entries in the memoization table
- Return type:
dict
Examples
>>> import zagar >>> result = zagar.count_binary_matrices([2, 1], [1, 1, 1]) >>> result['number'] 3
count_matrices¶
- zagar.count_matrices(p, q, sort_p_desc=False)¶
Wrapper for
count_binary_matricesThis is a wrapper for
count_binary_matricesthat returns just the count.- Parameters:
p (list or array-like) – Row sums (nonnegative integers)
q (list or array-like) – Column sums (nonnegative integers)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed optimization
- Returns:
Number of binary matrices with the given marginals
- Return type:
int
Examples
>>> import zagar >>> zagar.count_matrices([2, 1], [1, 1, 1]) 3
getA12¶
- zagar.getA12(N, a_vec, phi0_vec, phi1_vec, sort_p_desc=True, return_approx=False)¶
Cardinality of Doubly Constrained Admixed Arrays
Uses a modified version of Miller and Harrison’s EXACT algorithm to compute the size of a set of admixed arrays with specified global ancestry vector \(\boldsymbol{a}\) (
a_vec, a row sum constraint) and ancestry-specific allele frequencies \((\boldsymbol{\phi}_0,\boldsymbol{\phi}_1)\) (phi0_vecandphi1_vec, column sum constraint).Optionally, it can return an approximate count and upper bound (in base two logarithm scale), based on a saddle-point approximation. Currently only works for the following special regular case:
phi0_vecandphi1_vecentries all equal toNa_vecvalues all equal to the size of the listphi0_vec
- Parameters:
N (int) – Positive integer parameter (number of individuals)
a_vec (list of int) – Row sums (length N, all equal to P for regular case)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (must have same length as phi0_vec)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: True)
return_approx (bool, optional) –
If True and inputs form a “regular case” (all a_vec equal P, all phi0_vec equal N, all phi1_vec equal N), return a dict with: ‘count’ (exact count), ‘is_regular’ (True), ‘log2_upper_bound’ (upper bound on log2(count)), and ‘log2_approx’ (approximation valid when N and P are large with bounded ratio).
If False or not regular case, return just the count (default: False)
- Returns:
If return_approx=False: The computed \(|\mathscr{A}_{12}|\) value (arbitrary precision int)
If return_approx=True: dict with count and approximation info
- Return type:
int or dict
Examples
>>> import zagar >>> # Basic usage >>> zagar.getA12(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1]) 7632 >>> # With approximation for regular case >>> result = zagar.getA12(N=4, a_vec=[4]*4, phi0_vec=[4]*4, phi1_vec=[4]*4, return_approx=True) >>> result['count'] 824410 >>> result['log2_approx'] # Approximation of log2(count) 19.6675
getA12_parallel¶
- zagar.getA12_parallel(N, a_vec, phi0_vec, phi1_vec, n_threads=-1, sort_p_desc=True, return_approx=False)¶
Cardinality of Doubly Constrained Admixed Arrays (Parallelized)
This is a parallelized version of
getA12, which relies on OpenMP. If the user already has OpenMP, this function should be used in place ofgetA12to speed up calculations.Parallelization is achieved by assigning workers to each value of the first coordinate of the constrained set \(\mathscr{S}\).
- Parameters:
N (int) – Positive integer parameter (number of individuals)
a_vec (list of int) – Row sums (length N, typically all equal to P for regular case)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (must have same length as phi0_vec)
n_threads (int, optional) – Number of threads (-1 = use all available, default: -1)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: True)
return_approx (bool, optional) –
If True and inputs form a “regular case” (all a_vec equal P, all phi0_vec equal N, all phi1_vec equal N), return a dict with:
’count’: exact count
’is_regular’: True
’log2_upper_bound’: upper bound on log2(count)
’log2_approx’: approximation valid when N and P are large with bounded ratio
If False or not regular case, returns just the count (default: False)
- Returns:
If return_approx=False: The computed \(|\mathscr{A}_{12}|\) value (arbitrary precision int)
If return_approx=True: dict with count and approximation info
- Return type:
int or dict
Note
This function uses OpenMP for parallelization when available. If OpenMP is not available at compile time, it falls back to serial execution.
Examples
>>> import zagar >>> # Basic usage >>> zagar.getA12_parallel(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1]) 7632 >>> # With approximation for regular case >>> result = zagar.getA12_parallel(N=7, a_vec=[7]*7, phi0_vec=[7]*7, phi1_vec=[7]*7, return_approx=True) >>> result['count'] 1014691016274501451776 >>> result['is_regular'] True
getA1¶
- zagar.getA1(N, P, a_vec, return_log=False)¶
Cardinality of \(\mathscr{A}_1\): Number of Arrays Subject to Row Margin Constraints
Formula: prod(C(2*P, a_n)) * 2^(2*N*P) for all a_n in a_vec
Mathematically, \(2^{2NP} \prod_{n=1}^N \binom{2P}{a_{n\cdot}}\) for all \(a_{n\cdot}\) in \(\boldsymbol{a}\)
- Parameters:
N (int) – Positive integer (number of rows)
P (int) – Positive integer (number of columns)
a_vec (list of int) – Row margin constraints (length N, values in [0, 2*P])
return_log (bool, optional) – If True, return log2 of result as float (default: False)
- Returns:
If return_log=False: The exact \(|\mathscr{A}_1|\) value (arbitrary precision)
If return_log=True: \(\log_2(|\mathscr{A}_1|)\) as a float
- Return type:
int or float
Examples
>>> import zagar >>> zagar.getA1(N=50, P=50, a_vec=[25]*50) >>> zagar.getA1(N=50, P=50, a_vec=[25]*50, return_log=True)
getA2¶
- zagar.getA2(N, P, phi0_vec, phi1_vec, return_log=False)¶
Cardinality of \(\mathscr{A}_2\): Number of Arrays Subject to Column Margin Constraints
Formula: prod(C(2*N, phi0_p+phi1_p) * C(phi0_j+phi1_p, phi1_p) * 2^(2*N-(phi0_p+phi1_p)))
Mathematically, \(\prod_{p=1}^P \binom{2N}{\phi_{p,0}+\phi_{p,1}} \binom{\phi_{p,0}+\phi_{p,1}}{\phi_{p,0}} 2^{2N - (\phi_{p,0}+\phi_{p,1})}\)
- Parameters:
N (int) – Positive integer (number of rows)
P (int) – Positive integer (number of columns)
phi0_vec (list of int) – First constraint vector (length P, values in [0, 2*N])
phi1_vec (list of int) – Second constraint vector (length P, phi0_j+phi1_j <= 2*N)
return_log (bool, optional) – If True, return log2 of result as float (default: False)
- Returns:
If return_log=False: The exact \(|\mathscr{A}_2|\) value (arbitrary precision)
If return_log=True: \(\log_2(|\mathscr{A}_2|)\) as a float
- Return type:
int or float
Examples
>>> import zagar >>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50) >>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50, return_log=True)
compareA1A2¶
- zagar.compareA1A2(a_vec, phi0_vec, phi1_vec, use_large_NP=False)¶
Compare Cardinalities of \(\mathscr{A}_1\) and \(\mathscr{A}_2\)
This function computes \(|\mathscr{A}_1|\) and \(|\mathscr{A}_2|\) and compares their sizes. Optionally, it can also compute an asymptotic approximation for large \(N\), \(P\) based on entropy calculations.
- Parameters:
a_vec (list of int) – Row margin constraints (length N)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (length P, same length as phi0_vec)
use_large_NP (bool, optional) – If True, compute asymptotic comparison using entropy criterion (default: False)
- Returns:
Dictionary with keys:
'log2_A1': float - log2 of \(|\mathscr{A}_1|\)'log2_A2': float - log2 of \(|\mathscr{A}_2|\)'outcome': str - “A1 is larger” or “A2 is at least as large”
If use_large_NP=True, also includes:
'large_NP_stat': float - the asymptotic criterion value (LHS - RHS)'large_NP_outcome': str - “A1 is roughly larger” or “A2 is roughly at least as large”
- Return type:
dict
Note
- The asymptotic condition for A1 > A2 (when use_large_NP=True) is:
H(
a_bar) > H(f0,f1, 1-f0-f1) - (f0+f1)- where:
a_bar=a_vec/ (2*P), the normalized row marginsf0=phi0_vec/ (2*N),f1=phi1_vec/ (2*N), the normalized column marginsH() denotes entropy
Examples
>>> import zagar >>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50) >>> print(result['outcome']) >>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50, use_large_NP=True) >>> print(result['large_NP_stat'])
get_constraints¶
- zagar.get_constraints(A, X)¶
Compute Marginal Constraint Vectors from Admixed Array
Given binary matrices \(\mathbf{A}\) (ancestry) and \(\mathbf{X}\) (allele dosage) that make up the admixed array \([\mathbf{A},\mathbf{X}]\), computes three marginal constraint vectors used in constrained admixed array enumeration.
- Parameters:
A (np.ndarray) – Binary matrix of shape (N, 2P) representing ancestry assignments. A[n, p] = 1 means individual n has ancestry 1 at position p. May contain np.nan values for missing data.
X (np.ndarray) – Binary matrix of shape (N, 2P) representing allele values. X[n, p] = 1 means individual n has allele 1 at position p. May contain np.nan values for missing data.
- Returns:
Tuple of three lists:
a_vec: list of int - Length N. Entry n is the row sum of A (total ancestry-1 count for individual n). This is the row margin constraint for \(\mathscr{A}_1\). NA values in A are ignored.phi0_vec: list of int - Length P. Entry p is the sum of X values at positions where A=0, across columns p and P+p for all individuals. This is the first column margin constraint for \(\mathscr{A}_2\).phi1_vec: list of int - Length P. Entry p is the sum of X values at positions where A=1, across columns p and P+p for all individuals. This is the second column margin constraint for \(\mathscr{A}_2\).
- Return type:
tuple of (list, list, list)
Note
If either A or X contains NA/NaN values, a warning is issued but computation continues. NA values are ignored in all sums (they contribute 0 to the totals).
Examples
>>> import numpy as np >>> import zagar >>> np.random.seed(42) >>> N, P = 6, 5 >>> A = np.random.randint(0, 2, size=(N, 2*P)).astype(float) >>> X = np.random.randint(0, 2, size=(N, 2*P)).astype(float) >>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X) >>> len(a_vec) == N True >>> len(phi0_vec) == P and len(phi1_vec) == P True >>> # With missing data >>> A[0, 0] = np.nan >>> X[1, 2] = np.nan >>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X) # warns but continues
has_openmp¶
- zagar.has_openmp()¶
Check if OpenMP support is available.
Returns True if the package was compiled with OpenMP support, False otherwise. When OpenMP is not available, getA12_parallel falls back to serial execution.
- Returns:
True if OpenMP is available, False otherwise.
- Return type:
bool
Examples
>>> import zagar >>> zagar.has_openmp() False # or True if compiled with OpenMP
Module Attributes¶
- zagar.__version__¶
Current version of the zagar package.
>>> import zagar >>> zagar.__version__ '0.1.0'