API Reference (Python)¶

count_binary_matrices¶

zagar.count_binary_matrices(p, q, sort_p_desc=False)¶

Count Binary Matrices with Row Sums \(\boldsymbol{p}\) and Column Sums \(\boldsymbol{q}\).

The underlying algorithm is a minor modification of the original EXACT algorithm implemented in Miller and Harrison (2013) The Annals of Statistics. Compared to their original C version, this version uses modern C++ optimizations (e.g., std::unordered_map in C++ versus khash from klib for hashing) and performs operations on flexible integer types, which may be more efficient when the problem size is not too large (e.g., uint64_t rather than arbitrary precision types for smaller input values).

Parameters:

p (list of int) – Row sums (nonnegative integers)
q (list of int) – Column sums (nonnegative integers)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: False)

Returns:

{'number': int, 'memo_entries': int}

number: The count of binary matrices (arbitrary precision)
memo_entries: Number of entries in the memoization table

Return type:

dict

Examples

>>> import zagar
>>> result = zagar.count_binary_matrices([2, 1], [1, 1, 1])
>>> result['number']
3

count_matrices¶

zagar.count_matrices(p, q, sort_p_desc=False)¶

Wrapper for count_binary_matrices

This is a wrapper for count_binary_matrices that returns just the count.

Parameters:

p (list or array-like) – Row sums (nonnegative integers)
q (list or array-like) – Column sums (nonnegative integers)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed optimization

Returns:

Number of binary matrices with the given marginals

Return type:

int

Examples

>>> import zagar
>>> zagar.count_matrices([2, 1], [1, 1, 1])
3

getA12¶

zagar.getA12(N, a_vec, phi0_vec, phi1_vec, sort_p_desc=True, return_approx=False)¶

Cardinality of Doubly Constrained Admixed Arrays

Uses a modified version of Miller and Harrison’s EXACT algorithm to compute the size of a set of admixed arrays with specified global ancestry vector \(\boldsymbol{a}\) (a_vec, a row sum constraint) and ancestry-specific allele frequencies \((\boldsymbol{\phi}_0,\boldsymbol{\phi}_1)\) (phi0_vec and phi1_vec, column sum constraint).

Optionally, it can return an approximate count and upper bound (in base two logarithm scale), based on a saddle-point approximation. Currently only works for the following special regular case:

phi0_vec and phi1_vec entries all equal to N
a_vec values all equal to the size of the list phi0_vec

Parameters:

N (int) – Positive integer parameter (number of individuals)
a_vec (list of int) – Row sums (length N, all equal to P for regular case)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (must have same length as phi0_vec)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: True)
return_approx (bool, optional) –
- If True and inputs form a “regular case” (all a_vec equal P, all phi0_vec equal N, all phi1_vec equal N), return a dict with: ‘count’ (exact count), ‘is_regular’ (True), ‘log2_upper_bound’ (upper bound on log2(count)), and ‘log2_approx’ (approximation valid when N and P are large with bounded ratio).
- If False or not regular case, return just the count (default: False)

Returns:

If return_approx=False: The computed \(|\mathscr{A}_{12}|\) value (arbitrary precision int)

If return_approx=True: dict with count and approximation info

Return type:

int or dict

Examples

>>> import zagar
>>> # Basic usage
>>> zagar.getA12(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1])
7632
>>> # With approximation for regular case
>>> result = zagar.getA12(N=4, a_vec=[4]*4, phi0_vec=[4]*4, phi1_vec=[4]*4, return_approx=True)
>>> result['count']
824410
>>> result['log2_approx']  # Approximation of log2(count)
19.6675

getA12_parallel¶

zagar.getA12_parallel(N, a_vec, phi0_vec, phi1_vec, n_threads=-1, sort_p_desc=True, return_approx=False)¶

Cardinality of Doubly Constrained Admixed Arrays (Parallelized)

This is a parallelized version of getA12, which relies on OpenMP. If the user already has OpenMP, this function should be used in place of getA12 to speed up calculations.

Parallelization is achieved by assigning workers to each value of the first coordinate of the constrained set \(\mathscr{S}\).

Parameters:

N (int) – Positive integer parameter (number of individuals)
a_vec (list of int) – Row sums (length N, typically all equal to P for regular case)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (must have same length as phi0_vec)
n_threads (int, optional) – Number of threads (-1 = use all available, default: -1)
sort_p_desc (bool, optional) – If True, sort p in decreasing order for speed (default: True)
return_approx (bool, optional) –
If True and inputs form a “regular case” (all a_vec equal P, all phi0_vec equal N, all phi1_vec equal N), return a dict with:
- ’count’: exact count
- ’is_regular’: True
- ’log2_upper_bound’: upper bound on log2(count)
- ’log2_approx’: approximation valid when N and P are large with bounded ratio
If False or not regular case, returns just the count (default: False)

Returns:

If return_approx=False: The computed \(|\mathscr{A}_{12}|\) value (arbitrary precision int)

If return_approx=True: dict with count and approximation info

Return type:

int or dict

Note

This function uses OpenMP for parallelization when available. If OpenMP is not available at compile time, it falls back to serial execution.

Examples

>>> import zagar
>>> # Basic usage
>>> zagar.getA12_parallel(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1])
7632
>>> # With approximation for regular case
>>> result = zagar.getA12_parallel(N=7, a_vec=[7]*7, phi0_vec=[7]*7, phi1_vec=[7]*7, return_approx=True)
>>> result['count']
1014691016274501451776
>>> result['is_regular']
True

getA1¶

zagar.getA1(N, P, a_vec, return_log=False)¶

Cardinality of \(\mathscr{A}_1\): Number of Arrays Subject to Row Margin Constraints

Formula: prod(C(2*P, a_n)) * 2^(2*N*P) for all a_n in a_vec

Mathematically, \(2^{2NP} \prod_{n=1}^N \binom{2P}{a_{n\cdot}}\) for all \(a_{n\cdot}\) in \(\boldsymbol{a}\)

Parameters:

N (int) – Positive integer (number of rows)
P (int) – Positive integer (number of columns)
a_vec (list of int) – Row margin constraints (length N, values in [0, 2*P])
return_log (bool, optional) – If True, return log2 of result as float (default: False)

Returns:

If return_log=False: The exact \(|\mathscr{A}_1|\) value (arbitrary precision)

If return_log=True: \(\log_2(|\mathscr{A}_1|)\) as a float

Return type:

int or float

Examples

>>> import zagar
>>> zagar.getA1(N=50, P=50, a_vec=[25]*50)
>>> zagar.getA1(N=50, P=50, a_vec=[25]*50, return_log=True)

getA2¶

zagar.getA2(N, P, phi0_vec, phi1_vec, return_log=False)¶

Cardinality of \(\mathscr{A}_2\): Number of Arrays Subject to Column Margin Constraints

Formula: prod(C(2*N, phi0_p+phi1_p) * C(phi0_j+phi1_p, phi1_p) * 2^(2*N-(phi0_p+phi1_p)))

Mathematically, \(\prod_{p=1}^P \binom{2N}{\phi_{p,0}+\phi_{p,1}} \binom{\phi_{p,0}+\phi_{p,1}}{\phi_{p,0}} 2^{2N - (\phi_{p,0}+\phi_{p,1})}\)

Parameters:

N (int) – Positive integer (number of rows)
P (int) – Positive integer (number of columns)
phi0_vec (list of int) – First constraint vector (length P, values in [0, 2*N])
phi1_vec (list of int) – Second constraint vector (length P, phi0_j+phi1_j <= 2*N)
return_log (bool, optional) – If True, return log2 of result as float (default: False)

Returns:

If return_log=False: The exact \(|\mathscr{A}_2|\) value (arbitrary precision)

If return_log=True: \(\log_2(|\mathscr{A}_2|)\) as a float

Return type:

int or float

Examples

>>> import zagar
>>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50)
>>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50, return_log=True)

compareA1A2¶

zagar.compareA1A2(a_vec, phi0_vec, phi1_vec, use_large_NP=False)¶

Compare Cardinalities of \(\mathscr{A}_1\) and \(\mathscr{A}_2\)

This function computes \(|\mathscr{A}_1|\) and \(|\mathscr{A}_2|\) and compares their sizes. Optionally, it can also compute an asymptotic approximation for large \(N\), \(P\) based on entropy calculations.

Parameters:

a_vec (list of int) – Row margin constraints (length N)
phi0_vec (list of int) – First constraint vector (length P)
phi1_vec (list of int) – Second constraint vector (length P, same length as phi0_vec)
use_large_NP (bool, optional) – If True, compute asymptotic comparison using entropy criterion (default: False)

Returns:

Dictionary with keys:

'log2_A1': float - log2 of \(|\mathscr{A}_1|\)
'log2_A2': float - log2 of \(|\mathscr{A}_2|\)
'outcome': str - “A1 is larger” or “A2 is at least as large”

If use_large_NP=True, also includes:

'large_NP_stat': float - the asymptotic criterion value (LHS - RHS)
'large_NP_outcome': str - “A1 is roughly larger” or “A2 is roughly at least as large”

Return type:

dict

Note

The asymptotic condition for A1 > A2 (when use_large_NP=True) is:

H(a_bar) > H(f0, f1, 1-f0-f1) - (f0 + f1)

where:

a_bar = a_vec / (2*P), the normalized row margins
f0 = phi0_vec / (2*N), f1 = phi1_vec / (2*N), the normalized column margins
H() denotes entropy

Examples

>>> import zagar
>>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50)
>>> print(result['outcome'])
>>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50, use_large_NP=True)
>>> print(result['large_NP_stat'])

get_constraints¶

zagar.get_constraints(A, X)¶

Compute Marginal Constraint Vectors from Admixed Array

Given binary matrices \(\mathbf{A}\) (ancestry) and \(\mathbf{X}\) (allele dosage) that make up the admixed array \([\mathbf{A},\mathbf{X}]\), computes three marginal constraint vectors used in constrained admixed array enumeration.

Parameters:

A (np.ndarray) – Binary matrix of shape (N, 2P) representing ancestry assignments. A[n, p] = 1 means individual n has ancestry 1 at position p. May contain np.nan values for missing data.
X (np.ndarray) – Binary matrix of shape (N, 2P) representing allele values. X[n, p] = 1 means individual n has allele 1 at position p. May contain np.nan values for missing data.

Returns:

Tuple of three lists:

a_vec: list of int - Length N. Entry n is the row sum of A (total ancestry-1 count for individual n). This is the row margin constraint for \(\mathscr{A}_1\). NA values in A are ignored.
phi0_vec: list of int - Length P. Entry p is the sum of X values at positions where A=0, across columns p and P+p for all individuals. This is the first column margin constraint for \(\mathscr{A}_2\).
phi1_vec: list of int - Length P. Entry p is the sum of X values at positions where A=1, across columns p and P+p for all individuals. This is the second column margin constraint for \(\mathscr{A}_2\).

Return type:

tuple of (list, list, list)

Note

If either A or X contains NA/NaN values, a warning is issued but computation continues. NA values are ignored in all sums (they contribute 0 to the totals).

Examples

>>> import numpy as np
>>> import zagar
>>> np.random.seed(42)
>>> N, P = 6, 5
>>> A = np.random.randint(0, 2, size=(N, 2*P)).astype(float)
>>> X = np.random.randint(0, 2, size=(N, 2*P)).astype(float)
>>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X)
>>> len(a_vec) == N
True
>>> len(phi0_vec) == P and len(phi1_vec) == P
True

>>> # With missing data
>>> A[0, 0] = np.nan
>>> X[1, 2] = np.nan
>>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X)  # warns but continues

has_openmp¶

zagar.has_openmp()¶

Check if OpenMP support is available.

Returns True if the package was compiled with OpenMP support, False otherwise. When OpenMP is not available, getA12_parallel falls back to serial execution.

Returns:: True if OpenMP is available, False otherwise.
Return type:: bool

Examples

>>> import zagar
>>> zagar.has_openmp()
False  # or True if compiled with OpenMP

Module Attributes¶

zagar.__version__¶

Current version of the zagar package.

>>> import zagar
>>> zagar.__version__
'0.1.0'