API Reference (Python)
======================

.. module:: zagar

count_binary_matrices
---------------------

.. function:: count_binary_matrices(p, q, sort_p_desc=False)

   Count Binary Matrices with Row Sums :math:`\boldsymbol{p}` and Column Sums :math:`\boldsymbol{q}`.

   The underlying algorithm is a minor modification of the original EXACT algorithm implemented
   in Miller and Harrison (2013) *The Annals of Statistics*. Compared to their original C version,
   this version uses modern C++ optimizations (e.g., ``std::unordered_map`` in C++
   versus ``khash`` from ``klib`` for hashing) and performs operations on flexible
   integer types, which may be more efficient when the problem size is not too large
   (e.g., ``uint64_t`` rather than arbitrary precision types for smaller input values).

   :param p: Row sums (nonnegative integers)
   :type p: list of int
   :param q: Column sums (nonnegative integers)
   :type q: list of int
   :param sort_p_desc: If True, sort p in decreasing order for speed (default: False)
   :type sort_p_desc: bool, optional
   :returns: ``{'number': int, 'memo_entries': int}``

      - ``number``: The count of binary matrices (arbitrary precision)
      - ``memo_entries``: Number of entries in the memoization table

   :rtype: dict

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> result = zagar.count_binary_matrices([2, 1], [1, 1, 1])
      >>> result['number']
      3


count_matrices
--------------

.. function:: count_matrices(p, q, sort_p_desc=False)

   Wrapper for ``count_binary_matrices``

   This is a wrapper for ``count_binary_matrices`` that returns just the count.

   :param p: Row sums (nonnegative integers)
   :type p: list or array-like
   :param q: Column sums (nonnegative integers)
   :type q: list or array-like
   :param sort_p_desc: If True, sort p in decreasing order for speed optimization
   :type sort_p_desc: bool, optional
   :returns: Number of binary matrices with the given marginals
   :rtype: int

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> zagar.count_matrices([2, 1], [1, 1, 1])
      3


getA12
------

.. function:: getA12(N, a_vec, phi0_vec, phi1_vec, sort_p_desc=True, return_approx=False)

   Cardinality of Doubly Constrained Admixed Arrays

   Uses a modified version of Miller and Harrison's EXACT algorithm
   to compute the size of a set of admixed arrays with specified
   global ancestry vector :math:`\boldsymbol{a}` (``a_vec``, a row
   sum constraint) and ancestry-specific allele frequencies
   :math:`(\boldsymbol{\phi}_0,\boldsymbol{\phi}_1)` (``phi0_vec`` and
   ``phi1_vec``, column sum constraint).

   Optionally, it can return an approximate count and upper bound
   (in base two logarithm scale), based on a saddle-point approximation.
   Currently only works for the following special regular case:

   - ``phi0_vec`` and ``phi1_vec`` entries all equal to ``N``
   - ``a_vec`` values all equal to the size of the list ``phi0_vec``

   :param N: Positive integer parameter (number of individuals)
   :type N: int
   :param a_vec: Row sums (length N, all equal to P for regular case)
   :type a_vec: list of int
   :param phi0_vec: First constraint vector (length P)
   :type phi0_vec: list of int
   :param phi1_vec: Second constraint vector (must have same length as phi0_vec)
   :type phi1_vec: list of int
   :param sort_p_desc: If True, sort p in decreasing order for speed (default: True)
   :type sort_p_desc: bool, optional
   :param return_approx:
      - If True and inputs form a "regular case" (all a_vec equal P, all phi0_vec equal N, all phi1_vec equal N),
        return a dict with: 'count' (exact count), 'is_regular' (True),
        'log2_upper_bound' (upper bound on log2(count)), and 'log2_approx'
        (approximation valid when N and P are large with bounded ratio).
      - If False or not regular case, return just the count (default: False)
   :type return_approx: bool, optional
   :returns:
      If return_approx=False: The computed :math:`|\mathscr{A}_{12}|` value (arbitrary precision int)

      If return_approx=True: dict with count and approximation info
   :rtype: int or dict

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> # Basic usage
      >>> zagar.getA12(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1])
      7632
      >>> # With approximation for regular case
      >>> result = zagar.getA12(N=4, a_vec=[4]*4, phi0_vec=[4]*4, phi1_vec=[4]*4, return_approx=True)
      >>> result['count']
      824410
      >>> result['log2_approx']  # Approximation of log2(count)
      19.6675


getA12_parallel
---------------

.. function:: getA12_parallel(N, a_vec, phi0_vec, phi1_vec, n_threads=-1, sort_p_desc=True, return_approx=False)

   Cardinality of Doubly Constrained Admixed Arrays (Parallelized)

   This is a parallelized version of ``getA12``, which relies on
   OpenMP. If the user already has OpenMP, this function should be used
   in place of ``getA12`` to speed up calculations.

   Parallelization is achieved by assigning workers to each value of the
   first coordinate of the constrained set :math:`\mathscr{S}`.

   :param N: Positive integer parameter (number of individuals)
   :type N: int
   :param a_vec: Row sums (length N, typically all equal to P for regular case)
   :type a_vec: list of int
   :param phi0_vec: First constraint vector (length P)
   :type phi0_vec: list of int
   :param phi1_vec: Second constraint vector (must have same length as phi0_vec)
   :type phi1_vec: list of int
   :param n_threads: Number of threads (-1 = use all available, default: -1)
   :type n_threads: int, optional
   :param sort_p_desc: If True, sort p in decreasing order for speed (default: True)
   :type sort_p_desc: bool, optional
   :param return_approx: If True and inputs form a "regular case" (all a_vec equal P,
      all phi0_vec equal N, all phi1_vec equal N), return a dict with:

      - 'count': exact count
      - 'is_regular': True
      - 'log2_upper_bound': upper bound on log2(count)
      - 'log2_approx': approximation valid when N and P are large with bounded ratio

      If False or not regular case, returns just the count (default: False)
   :type return_approx: bool, optional
   :returns:
      If return_approx=False: The computed :math:`|\mathscr{A}_{12}|` value (arbitrary precision int)

      If return_approx=True: dict with count and approximation info
   :rtype: int or dict

   .. note::

      This function uses OpenMP for parallelization when available.
      If OpenMP is not available at compile time, it falls back to serial execution.

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> # Basic usage
      >>> zagar.getA12_parallel(N=5, a_vec=[2, 2], phi0_vec=[1, 1], phi1_vec=[1, 1])
      7632
      >>> # With approximation for regular case
      >>> result = zagar.getA12_parallel(N=7, a_vec=[7]*7, phi0_vec=[7]*7, phi1_vec=[7]*7, return_approx=True)
      >>> result['count']
      1014691016274501451776
      >>> result['is_regular']
      True


getA1
-----

.. function:: getA1(N, P, a_vec, return_log=False)

   Cardinality of :math:`\mathscr{A}_1`: Number of Arrays Subject to Row Margin Constraints

   Formula: prod(C(2*P, a_n)) * 2^(2*N*P) for all a_n in a_vec

   Mathematically,
   :math:`2^{2NP} \prod_{n=1}^N \binom{2P}{a_{n\cdot}}` for all :math:`a_{n\cdot}` in :math:`\boldsymbol{a}`

   :param N: Positive integer (number of rows)
   :type N: int
   :param P: Positive integer (number of columns)
   :type P: int
   :param a_vec: Row margin constraints (length N, values in [0, 2*P])
   :type a_vec: list of int
   :param return_log: If True, return log2 of result as float (default: False)
   :type return_log: bool, optional
   :returns:
      If return_log=False: The exact :math:`|\mathscr{A}_1|` value (arbitrary precision)

      If return_log=True: :math:`\log_2(|\mathscr{A}_1|)` as a float
   :rtype: int or float

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> zagar.getA1(N=50, P=50, a_vec=[25]*50)
      >>> zagar.getA1(N=50, P=50, a_vec=[25]*50, return_log=True)


getA2
-----

.. function:: getA2(N, P, phi0_vec, phi1_vec, return_log=False)

   Cardinality of :math:`\mathscr{A}_2`: Number of Arrays Subject to Column Margin Constraints

   Formula: prod(C(2*N, phi0_p+phi1_p) * C(phi0_j+phi1_p, phi1_p) * 2^(2*N-(phi0_p+phi1_p)))

   Mathematically,
   :math:`\prod_{p=1}^P \binom{2N}{\phi_{p,0}+\phi_{p,1}} \binom{\phi_{p,0}+\phi_{p,1}}{\phi_{p,0}} 2^{2N - (\phi_{p,0}+\phi_{p,1})}`

   :param N: Positive integer (number of rows)
   :type N: int
   :param P: Positive integer (number of columns)
   :type P: int
   :param phi0_vec: First constraint vector (length P, values in [0, 2*N])
   :type phi0_vec: list of int
   :param phi1_vec: Second constraint vector (length P, phi0_j+phi1_j <= 2*N)
   :type phi1_vec: list of int
   :param return_log: If True, return log2 of result as float (default: False)
   :type return_log: bool, optional
   :returns:
      If return_log=False: The exact :math:`|\mathscr{A}_2|` value (arbitrary precision)

      If return_log=True: :math:`\log_2(|\mathscr{A}_2|)` as a float
   :rtype: int or float

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50)
      >>> zagar.getA2(N=50, P=50, phi0_vec=[20]*50, phi1_vec=[22]*50, return_log=True)


compareA1A2
-----------

.. function:: compareA1A2(a_vec, phi0_vec, phi1_vec, use_large_NP=False)

   Compare Cardinalities of :math:`\mathscr{A}_1` and :math:`\mathscr{A}_2`

   This function computes :math:`|\mathscr{A}_1|` and :math:`|\mathscr{A}_2|` and compares their sizes.
   Optionally, it can also compute an asymptotic approximation for large :math:`N`, :math:`P`
   based on entropy calculations.

   :param a_vec: Row margin constraints (length N)
   :type a_vec: list of int
   :param phi0_vec: First constraint vector (length P)
   :type phi0_vec: list of int
   :param phi1_vec: Second constraint vector (length P, same length as phi0_vec)
   :type phi1_vec: list of int
   :param use_large_NP: If True, compute asymptotic comparison using entropy criterion (default: False)
   :type use_large_NP: bool, optional
   :returns: Dictionary with keys:

      - ``'log2_A1'``: float - log2 of :math:`|\mathscr{A}_1|`
      - ``'log2_A2'``: float - log2 of :math:`|\mathscr{A}_2|`
      - ``'outcome'``: str - "A1 is larger" or "A2 is at least as large"

      If use_large_NP=True, also includes:

      - ``'large_NP_stat'``: float - the asymptotic criterion value (LHS - RHS)
      - ``'large_NP_outcome'``: str - "A1 is roughly larger" or "A2 is roughly at least as large"

   :rtype: dict

   .. note::

      The asymptotic condition for A1 > A2 (when use_large_NP=True) is:
          H(``a_bar``) > H(``f0``, ``f1``, 1-``f0``-``f1``) - (``f0`` + ``f1``)
      where:
          - ``a_bar`` = ``a_vec`` / (2*P), the normalized row margins
          - ``f0`` = ``phi0_vec`` / (2*N), ``f1`` = ``phi1_vec`` / (2*N), the normalized column margins
          - H() denotes entropy

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50)
      >>> print(result['outcome'])
      >>> result = zagar.compareA1A2([25]*50, [20]*50, [22]*50, use_large_NP=True)
      >>> print(result['large_NP_stat'])


get_constraints
---------------

.. function:: get_constraints(A, X)

   Compute Marginal Constraint Vectors from Admixed Array

   Given binary matrices :math:`\mathbf{A}` (ancestry) and :math:`\mathbf{X}` (allele dosage)
   that make up the admixed array :math:`[\mathbf{A},\mathbf{X}]`, computes three marginal
   constraint vectors used in constrained admixed array enumeration.

   :param A: Binary matrix of shape (N, 2P) representing ancestry assignments.
      A[n, p] = 1 means individual n has ancestry 1 at position p.
      May contain np.nan values for missing data.
   :type A: np.ndarray
   :param X: Binary matrix of shape (N, 2P) representing allele values.
      X[n, p] = 1 means individual n has allele 1 at position p.
      May contain np.nan values for missing data.
   :type X: np.ndarray
   :returns: Tuple of three lists:

      - ``a_vec``: list of int - Length N. Entry n is the row sum of A (total ancestry-1 count for
        individual n). This is the row margin constraint for :math:`\mathscr{A}_1`.
        NA values in A are ignored.
      - ``phi0_vec``: list of int - Length P. Entry p is the sum of X values at positions where A=0,
        across columns p and P+p for all individuals.
        This is the first column margin constraint for :math:`\mathscr{A}_2`.
      - ``phi1_vec``: list of int - Length P. Entry p is the sum of X values at positions where A=1,
        across columns p and P+p for all individuals.
        This is the second column margin constraint for :math:`\mathscr{A}_2`.

   :rtype: tuple of (list, list, list)

   .. note::

      If either A or X contains NA/NaN values, a warning is issued but computation
      continues. NA values are ignored in all sums (they contribute 0 to the totals).

   **Examples**

   .. code-block:: python

      >>> import numpy as np
      >>> import zagar
      >>> np.random.seed(42)
      >>> N, P = 6, 5
      >>> A = np.random.randint(0, 2, size=(N, 2*P)).astype(float)
      >>> X = np.random.randint(0, 2, size=(N, 2*P)).astype(float)
      >>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X)
      >>> len(a_vec) == N
      True
      >>> len(phi0_vec) == P and len(phi1_vec) == P
      True

      >>> # With missing data
      >>> A[0, 0] = np.nan
      >>> X[1, 2] = np.nan
      >>> a_vec, phi0_vec, phi1_vec = zagar.get_constraints(A, X)  # warns but continues


has_openmp
----------

.. function:: has_openmp()

   Check if OpenMP support is available.

   Returns True if the package was compiled with OpenMP support,
   False otherwise. When OpenMP is not available, getA12_parallel
   falls back to serial execution.

   :returns: True if OpenMP is available, False otherwise.
   :rtype: bool

   **Examples**

   .. code-block:: python

      >>> import zagar
      >>> zagar.has_openmp()
      False  # or True if compiled with OpenMP


Module Attributes
-----------------

.. data:: __version__

   Current version of the zagar package.

   .. code-block:: python

      >>> import zagar
      >>> zagar.__version__
      '0.1.0'