bioat.lib package

Submodules

bioat.lib.libalignment module

Library for alignment, depends on Biopython.

author: Herman Huanan Zhao email: hermanzhaozzzz@gmail.com homepage: https://github.com/hermanzhaozzzz

Library for alignment, depends on Biopython

example 1:
bioat.lib.libalignment
<in python consolo>:
>>> from bioat.lib.libalignment import instantiate_pairwise_aligner
>>> aligner = instantiate_pairwise_aligner()
>>> alignment = get_best_alignment(
>>>     Seq("GGCACTGCGGCTGGAAAAAAAAAAAAAAAGT"),
>>>     Seq("GGCAGCGGCTGGAAAAAAAAAAAAAAAAGGT"),
>>>     aligner=aligner,
>>>     consider_strand=True
>>> )
>>> res = get_aligned_seq(alignment, reverse=False)
>>> print(res)
>>> res = get_aligned_seq(alignment, reverse=True)
>>> print(res)
>>> res = get_alignment_info(alignment, reverse=False)
>>> print(res)
>>> res = get_alignment_info(alignment, reverse=True)
>>> print(res)
bioat.lib.libalignment.get_aligned_seq(alignment: Alignment, reverse: bool = False, letn_match=False) dict[source]

Retrieve aligned sequence information from a Bio.Align.Alignment object.

Parameters:
  • alignment (Bio.Align.Alignment) – A Bio.Align.Alignment object containing the alignment information.

  • reverse (bool, optional) – Whether to return reversed sequence information. Defaults to False.

Returns:

A dictionary containing the alignment details with the following keys:
  • ’reference_seq’ (str): The aligned reference sequence.

  • ’aln_info’ (str): The alignment information string, indicating matches (‘|’), mismatches (‘.’), and gaps (‘-‘).

  • ’target_seq’ (str): The aligned target sequence.

Return type:

dict

Example

>>> result = get_aligned_seq(alignment)
>>> print(result)
{
    'reference_seq': 'GGCACTGCGGCTGGAAAAAAAAAAAAAAA--GT',
    'aln_info':      '--..|.|||||||||||||||||||||||--||',
    'target_seq':    '--GGCAGCGGCTGGAAAAAAAAAAAAAAAAGGT'
}
bioat.lib.libalignment.get_alignment_info(alignment: Alignment, reverse: bool = False, letn_match=False, log_level='WARNING')[source]

Analyze a Bio.Align.Alignment object to extract alignment metrics.

This function calculates the match count, mismatch count, gap count, alignment score, and start and end indices for a given alignment.

Parameters:
  • alignment (Bio.Align.Alignment) – A Bio.Align.Alignment object containing the alignment information.

  • reverse (bool, optional) – Whether to return reversed sequence information. Defaults to False.

Returns:

A list containing the following metrics:

  • match_count (int): The number of matches in the alignment.

  • mismatch_count (int): The number of mismatches in the alignment.

  • gap_count (int): The number of gaps within the sgRNA alignment region. Note: This counts only gaps within the alignment region, not total gaps.

  • alignment_score (float): The alignment score.

  • start_index (int): The start index of the sgRNA alignment region.

  • end_index (int): The end index of the sgRNA alignment region.

Return type:

list

Notes

The gap count specifically counts gaps within the sgRNA alignment region, not the total gaps in the entire sequence. For example:

>>> Reference: AGTGGTAAGAAGAAGACGAGACATAATGAG
>>>             ------||||||||||||||.|----||||
>>> Target: ------AAGAAGAAGACGAGCC - ---TGAG

In this example: Gap count = 4 (within sgRNA alignment region), not 10 (total gaps).

Example

The returned list will be as follows: return_list = [match_count, mismatch_count, gap_count, alignment_score, start_index, end_index]

bioat.lib.libalignment.get_best_alignment(seq_a: Seq, seq_b: Seq, aligner: PairwiseAligner, consider_strand=True)[source]

Get best alignment by heightest alignment score.

Get best alignment by heightest alignment score, default to consider strand, that means, it will attempt to use seq_b.reverse_complement() to align

Parameters:
  • seq_a (Seq) – seq_a

  • seq_b (Seq) – seq_b

  • aligner (PairwiseAligner) – aligner returns from instantiate_pairwise_aligner

  • consider_strand (bool, optional) – consider to use seq_b.reverse_complement() or not, defaults to True

Returns:

Alignment with a new attribute: alignment.is_a_reverse_complement_alignment (True / False)

Return type:

Alignment

bioat.lib.libalignment.instantiate_pairwise_aligner(scoring_match: float = 1, penalty_mismatch: float = 0.8, penalty_gap_open: float = -5, penalty_gap_extension: float = -2, penalty_query_left_gap_score: float = 0, penalty_query_right_gap_score: float = 0, mode='global', letn_match=False, score_matrix_dict: None | dict = None, log_level='DEBUG') PairwiseAligner[source]

Returns a PairwiseAligner object.

Parameters:
  • scoring_match (float, optional) – Scoring for matches. Defaults to 1.

  • penalty_mismatch (float, optional) – Penalty for mismatches. Defaults to 0.8.

  • penalty_gap_open (float, optional) – Penalty for opening a gap. Defaults to -5.

  • penalty_gap_extension (float, optional) – Penalty for extending a gap. Defaults to -2.

  • penalty_query_left_gap_score (float, optional) – Penalty for left query gap. Defaults to 0.

  • penalty_query_right_gap_score (float, optional) – Penalty for right query gap. Defaults to 0.

  • mode (str, optional) – Alignment mode, either “global” or “local”. Defaults to “global”.

  • letn_match (bool, optional) – Whether to treat base ‘N’ as a match. Defaults to False.

  • log_level (str, optional) – Logging level. Defaults to “DEBUG”.

  • score_matrix_dict (dict, optional) – Custom scoring matrix for bases or amino acids. Defaults to None.

Returns:

An object instantiated from PairwiseAligner.

Return type:

PairwiseAligner

Details for score_matrix_dict:

You can use any base and any score, such as AGCTN, or 20AA + 2 abnormal AAs, or any characters you want. For example:

score_matrix_dict = {

(“A”, “A”): scoring_match, (“A”, “G”): penalty_mismatch, (“A”, “C”): penalty_mismatch, (“A”, “T”): penalty_mismatch, (“A”, “N”): scoring_match, (“G”, “A”): penalty_mismatch, (“G”, “G”): scoring_match, (“G”, “C”): penalty_mismatch, (“G”, “T”): penalty_mismatch, (“G”, “N”): scoring_match, (“C”, “A”): penalty_mismatch, (“C”, “G”): penalty_mismatch, (“C”, “C”): scoring_match, (“C”, “T”): penalty_mismatch, (“C”, “N”): scoring_match, (“T”, “A”): penalty_mismatch, (“T”, “G”): penalty_mismatch, (“T”, “C”): penalty_mismatch, (“T”, “T”): scoring_match, (“T”, “N”): scoring_match, (“N”, “A”): scoring_match, (“N”, “G”): scoring_match, (“N”, “C”): scoring_match, (“N”, “T”): scoring_match, (“N”, “N”): scoring_match

}

bioat.lib.libcircos module

Doc. Adapt from https://github.com/ponnhide/pyCircos and thanks for this github user.

class bioat.lib.libcircos.Garc(arc_id=None, record=None, size=1000, interspace=3, raxis_range=(500, 550), facecolor=None, edgecolor='#303030', linewidth=0.75, label=None, labelposition=0, labelsize=10, label_visible=False)[source]

Bases: object

Class representation for an arc section in a graphic circle.

Variables:
  • colorlist (list of str) – List of color codes for the arc sections.

  • _arcnum (int) – Counter for the number of arcs.

__setitem__(key, item)[source]

Sets the attribute key with the value item.

__getitem__(key)[source]

Retrieves the value of the attribute key.

__init__(arc_id=None, record=None, size=1000, interspace=3, raxis_range=(500, 550), facecolor=None, edgecolor='#303030', linewidth=0.75, label=None, labelposition=0, labelsize=10, label_visible=False)[source]

Initializes the Garc object with specified parameters.

Parameters:
  • arc_id (str, optional) – Unique identifier for the Garc class object. If None, a unique ID is generated. Default is None.

  • record (Bio.SeqRecord or str, optional) – Bio.SeqRecord object or NCBI accession number for an annotated sequence. Default is None.

  • size (int, optional) – Width of the arc section, default is 1000. Adjusted if record is provided.

  • interspace (float, optional) – Distance angle (degrees) to the adjacent arc section. Default is 3.

  • raxis_range (tuple of int, optional) – Radial axis range for line plotting. Default is (500, 550).

  • facecolor (str or tuple, optional) – Color for filling the arc section. Default is automatically set.

  • edgecolor (str or tuple, optional) – Color for the edge of the filled area. Default is “#303030”.

  • linewidth (float, optional) – Width of the edge line. Default is 0.75.

  • label (str, optional) – Label for the arc section. Default is None.

  • labelposition (int, optional) – Relative label height from the center. Default is 0.

  • labelsize (int, optional) – Font size of the label. Default is 10.

  • label_visible (bool, optional) – Determines if the label is visible on the arc section. Default is False.

Raises:

ValueError – If no match for the provided NCBI accession number is found in record.

calc_density(positions, window_size=1000)[source]

Calculates density values in a sliding window based on given positions.

Parameters:
  • positions (list of int or tuple) – A list of x-coordinate values or a tuple containing two x-coordinate values. Each value should be in the range of 0 to the size of the Garc object.

  • window_size (int, optional) – The size of the sliding window. Defaults to 1000.

Raises:

ValueError – If an inappropriate value or values are provided for positions.

Returns:

A list containing the calculated density values.

Return type:

list

calc_nnratio(n1='G', n2='C', window_size=1000, step_size=None)[source]

Calculates the ratio of nucleotide base frequencies for two specified bases over multiple windows along a sequence.

Parameters:
  • n1 (str) – The first nucleotide base to be compared. Must be one of “ATGC”. Default is “G”.

  • n2 (str) – The second nucleotide base to be compared. Must be one of “ATGC”. Default is “C”.

  • window_size (int) – Size of the sliding window. Default is 1000.

  • step_size (int, optional) – Size of the sliding step. If not provided, defaults to window_size.

Raises:

ValueError – If no record is provided.

Returns:

An array containing the computed ratios of n1 to n2 for each window.

Return type:

np.array

calc_nnskew(n1='G', n2='C', window_size=1000, step_size=None)[source]

Calculates n1,n2 skew (n1-n2)/(n1+n2) for multiple windows along the sequence.

Parameters:
  • n1 (str, optional) – The first of the two nucleotide bases to be compared. The default is “G”.

  • n2 (str, optional) – The second of the two nucleotide bases to be compared. The default is “C”.

  • window_size (int, optional) – Size of the sliding window. The default is 1000.

  • step_size (int, optional) – Size of the sliding step. The default is window_size.

Raises:

ValueError – If no record is provided, will return an error.

Returns:

An array of the skews computed by this method.

Return type:

np.array

colorlist = ['#ff8a80', '#ff80ab', '#ea80fc', '#b388ff', '#8c9eff', '#82b1ff', '#84ffff', '#a7ffeb', '#b9f6ca', '#ccff90', '#f4ff81', '#ffff8d', '#ffe57f', '#ffd180', '#ff9e80', '#bcaaa4', '#eeeeee', '#b0bec5', '#ff5252', '#ff4081', '#e040fb', '#7c4dff', '#536dfe', '#448aff', '#18ffff', '#64ffda', '#69f0ae', '#b2ff59', '#eeff41', '#ffff00', '#ffd740', '#ffab40', '#ff6e40', '#a1887f', '#e0e0e0', '#90a4ae']
class bioat.lib.libcircos.Gcircle(fig=None, figsize=None)[source]

Bases: object

A Gcircle class object provides a circle whose diameter is 1000 (a.u.) as a drawing space. Any graph (line plot, scatter plot, barplot, heatmap, and chordplot) can be placed on the space by specifying the raxis_range (from 0 to 1000) and the corresponding Garc class object.

__init__(fig=None, figsize=None)[source]

Initializes the circular map.

Parameters:
  • fig (matplotlib.pyplot.Figure, optional) – A Matplotlib Figure object. If not provided, a new figure will be created.

  • figsize (tuple, optional) – A tuple specifying the size of the figure for the circular map, e.g., (width, height).

add_garc(garc)[source]

Add a new Garc class object to the garc_dict.

Parameters:

garc (Garc) – The Garc class object to be added.

Returns:

None

barplot(garc_id, data, positions=None, width=None, raxis_range=(550, 600), rlim=None, base_value=None, facecolor=None, edgecolor='#303030', linewidth=0.0, spine=False)[source]

Draws a bar plot within the specified arc of the Garc class object.

This method visualizes numerical data as bars in a sector corresponding to the specified arc of a Garc class object identified by garc_id.

Parameters:
  • garc_id (str) – The ID of the Garc class object. It should exist in the Gcircle object’s garc_dict.

  • data (list or numpy.ndarray) – The numerical data used to generate the plot.

  • positions (list or numpy.ndarray, optional) – The x-coordinates of the values in data when plotted in rectangular coordinates. Each value should be within the range of the Garc class object’s size identified by garc_id. If not provided, coordinates are automatically generated based on the length of data.

  • width (float or list of float, optional) – The width(s) of the bars. The default width is determined by garc_object.size / len(data).

  • raxis_range (tuple of int, optional) – Range for the radial axis where the line plot is drawn, defaulting to (550, 600).

  • rlim (tuple of int, optional) – Specifies the top and bottom limits for the radial data coordinates. If not provided, it defaults to the minimum and maximum values of data.

  • base_value (float, optional) – Height of the baseline in data coordinates; the area between this line and the data line is filled with facecolor. Defaults to None.

  • facecolor (str or tuple or list, optional) – Specifies the face color(s) of the bars. If a list is provided, its length must match that of data. Default is None.

  • edgecolor (str or tuple, optional) – Color of the edges of the bars, defaulting to “#303030”.

  • linewidth (float, optional) – Width of the edge lines of the bars; defaults to 0.0 (no lines).

  • spine (bool, optional) – If True, displays the spines of the Garc object on the arc section. Defaults to False.

Returns:

This method does not return any value.

Return type:

None

chord_plot(start_list, end_list, facecolor=None, edgecolor=None, linewidth=0.0)[source]

Visualize interrelationships between data.

Parameters:
  • start_list (tuple) – Start data location of linked data. The tuple is composed of four parameters: - arc_id (str): The ID of the first Garc class object to be compared. The ID should be in Gcircle object’s garc_dict. - edge_position1 (int): The minimal x coordinate on the Garc class object when the plot is drawn on the rectangular coordinates. - edge_position2 (int): The maximal x coordinate on the Garc class object when the plot is drawn on the rectangular coordinates. - raxis_position (int): The base height for the drawing chord.

  • end_list (tuple) – End data location of linked data. The tuple is composed of four parameters: - arc_id (str): The ID of the second Garc class object to be compared. the ID should be in Gcircle object’s garc_dict. - edge_position1 (int): The minimal x coordinate on the Garc class object when the plot is drawn on the rectangular coordinates. - edge_position2 (int): The maximal x coordinate on the Garc class object when the plot is drawn on the rectangular coordinates. - raxis_position (int): The base height for the drawing chord.

  • facecolor (str or tuple, optional) – Facecolor of the link. The default is None.

  • edgecolor (str or tuple, optional) – Edge color of the link. The default is “#303030”.

  • linewidth (float, optional) – Edge line width of the link. The default is 0.0.

Returns:

None

cmaps = [<matplotlib.colors.LinearSegmentedColormap object>, <matplotlib.colors.LinearSegmentedColormap object>, <matplotlib.colors.LinearSegmentedColormap object>, <matplotlib.colors.LinearSegmentedColormap object>]
colors = ['#f44336', '#e91e63', '#9c27b0', '#673ab7', '#3f51b5', '#2196f3', '#00bcd4', '#009688', '#4caf50', '#8bc34a', '#cddc39', '#ffeb3b', '#ffc107', '#ff9800', '#ff5722', '#795548', '#9e9e9e', '#607d8b']
featureplot(garc_id, feature_type=None, source=None, raxis_range=(550, 600), facecolor=None, edgecolor='#303030', linewidth=0.0, spine=False)[source]

Visualize sequence features with bar plots in the sector corresponding to the arc of the Garc class object specified by garc_id.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • feature_type (str, optional) – Biological nature of the Bio.Seqfeature class objects. Accepts any value, but GenBank format requires registering a biological nature category for each sequence feature. If the value is “all”, all features in source will be drawn in the sector of the Garc class object specified by garc_id. The default is ‘all’.

  • source (list of Bio.SeqFeature, optional) – List of Bio.SeqFeature class objects. If not provided, record.features of the Garc class object specified by garc_id is used. The default is record.features of the Garc class object specified by garc_id.

  • raxis_range (tuple of int, optional) – Radial axis range where the feature plot is drawn. The default is (550, 600).

  • facecolor (str or tuple, optional) – Facecolor(s) of the bars. If a list is provided, its length should match the data length. The default is None.

  • edgecolor (str or tuple, optional) – Edge color of the bars. The default is “#303030”.

  • linewidth (float, optional) – Edge line width of the bars. The default is 0.0.

  • spine (bool, optional) – If True, spines of the Garc object are shown on the arc section. The default is False.

Returns:

None

fillplot(garc_id, data, positions=None, raxis_range=(550, 600), rlim=None, base_value=None, facecolor=None, edgecolor='#303030', linewidth=0.0, spine=False)[source]

Fill a specified area in the sector corresponding to the arc of the Garc class object specified by garc_id.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • data (list or numpy.ndarray) – Numerical data used for plot generation.

  • positions (list or numpy.ndarray, optional) – The x coordinates of the values in data on the Garc class object when the plot is drawn on the rectangular coordinates. Each coordinate value should be in the range 0 to the size of the Garc class object specified by garc_id. If positions are not given, proper coordinate values are generated according to the length of data. Default is None.

  • raxis_range (tuple of int, optional) – Radial axis range where line plot is drawn. Default is (550, 600).

  • rlim (tuple of int, optional) – The top and bottom r limits in data coordinates. If not given, the maximum and minimum values in data will be set to top and bottom, respectively. Default is (min(data), max(data)).

  • base_value (float, optional) – Base line height in data coordinates. The area between the base line and the data line is filled by facecolor. Default is None.

  • facecolor (str or tuple, optional) – Color for filling. Default is None.

  • edgecolor (str or tuple, optional) – Edge color of the filled area. Default is “#303030”.

  • linewidth (float, optional) – Edge line width. The default is 0.0.

  • spine (bool, optional) – If True, spines of the Garc object are shown on the arc section. Default is False.

Returns:

This function does not return any value.

Return type:

None

heatmap(garc_id, data, positions=None, width=None, raxis_range=(550, 600), cmap=None, vmin=None, vmax=None, edgecolor='#303030', linewidth=0.0, spine=False)[source]

Visualize magnitudes of data values by color scale in the sector corresponding to the arc of the Garc class object specified by garc_id.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • data (list or numpy.ndarray) – Numerical data to be used for plot generation.

  • positions (list or numpy.ndarray, optional) – The x coordinates of the values in data on the Garc class object when the plot is drawn on the rectangular coordinates. Each coordinate value should be in the range 0 to size of the Garc class object specified by garc_id. If positions are not given, proper coordinates values are generated according to the length of data. Defaults to None.

  • width (float or list of float, optional) – Width(s) of the bars. Defaults to garc_object.size / len(data).

  • raxis_range (tuple, optional) – Radial axis range where heatmap is drawn. Default is (550, 600).

  • cmap (str or matplotlib.colors.Colormap, optional) – The mapping from data values to color space. Default is ‘Reds’.

  • vmin (float, optional) – Minimum data threshold for color scale. Defaults to min(data).

  • vmax (float, optional) – Maximum data threshold for color scale. Defaults to max(data).

  • edgecolor (str or tuple, optional) – Edge color of the bars. Default is “#303030”.

  • linewidth (float, optional) – Edge line width of the bars. Default is 0.0.

  • spine (bool, optional) – If True, spines of the Garc object are shown on the arc section. Default is False.

Returns:

None

lineplot(garc_id, data, positions=None, raxis_range=(550, 600), rlim=None, linestyle='solid', linecolor=None, linewidth=1.0, spine=False)[source]

Plot a line in the sector corresponding to the arc of the Garc class object specified by garc_id.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • data (list or numpy.ndarray) – Numerical data to be used for plot generation.

  • positions (list or numpy.ndarray, optional) – The x coordinates of the values in data on the Garc class object when the plot is drawn on rectangular coordinates. Each coordinate value should be in the range 0 to size of the Garc class object specified by garc_id. If positions are not given, proper coordinate values are generated according to the length of data. The default is None.

  • raxis_range (tuple, optional) – Radial axis range where line plot is drawn. The default is (550, 600).

  • rlim (tuple, optional) – The top and bottom r limits in data coordinates. If rlim value is not given, the maximum value and the minimum value in data will be set to top and bottom, respectively. The default is None.

  • linestyle (str, optional) – Line style. The default is “solid”. Possible line styles are documented at https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html

  • linecolor (str or tuple, optional) – Color of the line plot. If linecolor value is not given, the color will be set according to the default color set of matplotlib. To specify the opacity for a line color, please use (r,g,b,a) or #FFFFFF format. The default is None.

  • linewidth (float, optional) – Edge line width. The default is 1.0.

  • spine (bool, optional) – If True, spines of the Garc object are shown on the arc section. The default is False.

Returns:

None

save(file_name='test', format='pdf', dpi=None)[source]

Saves the figure object of the Gcircle class to a file.

Parameters:
  • file_name (str, optional) – The name of the file to save the figure. Defaults to “test”.

  • format (str, optional) – The file format for the saved figure. Defaults to “pdf”.

  • dpi (int, optional) – The resolution of the saved figure in dots per inch. Defaults to None.

Returns:

None

scatterplot(garc_id, data, positions=None, raxis_range=(550, 600), rlim=None, markershape='o', markersize=5, facecolor=None, edgecolor='#303030', linewidth=0.0, spine=False)[source]

Draws a scatter plot on the sector corresponding to the arc of the Garc class object.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • data (list or numpy.ndarray) – Numerical data used for plot generation.

  • positions (list or numpy.ndarray, optional) – The x coordinates of the values in data on the Garc class object when the plot is drawn on rectangular coordinates. Each coordinate value should be in the range 0 to the size of the Garc class object specified by garc_id. If positions are not provided, proper coordinate values are generated according to the length of data. Default is None.

  • raxis_range (tuple, optional) – Radial axis range where the line plot is drawn. Default is (550, 600).

  • rlim (tuple, optional) – The top and bottom r limits in data coordinates. Default is (min(data), max(data)) if not provided.

  • markershape (str, optional) – Marker shape. Default is “o”. Possible markers are listed at https://matplotlib.org/stable/gallery/lines_bars_and_markers/marker_reference.html.

  • markersize (float or list of float, optional) – Size(s) of the marker(s). Default is 5.

  • facecolor (str, tuple or list, optional) – Face color(s) of the markers. If a list is provided, its length should match the length of data. Default is None.

  • edgecolor (str or tuple, optional) – Edge color of the markers. Default is “#303030”.

  • linewidth (float, optional) – Edge line width of the markers. Default is 0.0.

  • spine (bool, optional) – If True, spines of the Garc object are shown in the arc section. Default is False.

Returns:

None

set_garcs(start=0, end=360)[source]

Visualize arc rectangles for Garc class objects in .garc_dict.

This method draws the arc rectangles of the Garc class objects in the drawing space. After executing this method, no new Garc class objects can be added to garc_dict, and a figure parameter representing a matplotlib.pyplot.figure object will be created in the Gcircle object.

Parameters:
  • start (int, optional) – Start angle of the circos plot. The range is -360 to 360. Defaults to 0.

  • end (int, optional) – End angle of the circos plot. The range is -360 to 360. Defaults to 360.

Returns:

None

setspine(garc_id, raxis_range=(550, 600), facecolor='#30303000', edgecolor='#303030', linewidth=0.75)[source]

Sets the spines in the sector corresponding to the arc of the Garc class object specified by garc_id.

Parameters:
  • garc_id (str) – ID of the Garc class object. The ID must be present in Gcircle object.garc_dict.

  • raxis_range (tuple, optional) – Radial axis range where the line plot is drawn. Defaults to (550, 600).

  • facecolor (str, optional) – Color for the spines area. Default is “#30303000”.

  • edgecolor (str, optional) – Edge color of the spines boundary area. Default is “#303030”.

  • linewidth (float, optional) – Edge line width of the spines boundary area. Default is 0.75.

Returns:

None

tickplot(garc_id, raxis_range=None, tickinterval=1000, tickpositions=None, ticklabels=None, tickwidth=1, tickcolor='#303030', ticklabelsize=10, ticklabelcolor='#303030', ticklabelmargin=10, tickdirection='outer', ticklabelorientation='vertical')[source]

Plots ticks on the arc of the Garc class object.

Parameters:
  • garc_id (str) – The ID of the Garc class object. The ID should be in Gcircle object.garc_dict.

  • raxis_range (tuple of int, optional) – Radial axis range where tick plot is drawn. If direction is “inner”, the default is (r0 - 0.5 * abs(r1 - r0), r0). If direction is “outer”, the default is (r1, r1 + 0.5 * abs(r1 - r0)). r0 and r1 are defined as Garc_object.raxis_range[0], Garc_object.raxis_range[1].

  • tickinterval (int, optional) – The interval between ticks. The default value is 1000. If tickpositions is provided, this value will be ignored.

  • tickpositions (list of int, optional) – Specific positions on the arc of the Garc class object where ticks should be placed. The values should be less than Garc_object.size.

  • ticklabels (list of int or str, optional) – Labels for the ticks on the arc of the Garc class object. The default value is the same as tickpositions.

  • tickwidth (float, optional) – The width of the ticks. The default value is 1.0.

  • tickcolor (str or float, optional) – The color of the ticks. The default value is “#303030”.

  • ticklabelsize (float, optional) – The font size of the tick labels. The default value is 10.

  • ticklabelcolor (str, optional) – The color of the tick labels. The default value is “#303030”.

  • ticklabelmargin (float, optional) – The margin of the tick labels. The default value is 10.

  • tickdirection (str, optional) – The direction of the ticks (“outer” or “inner”). The default value is “outer”.

  • ticklabelorientation (str, optional) – The orientation of the tick labels (“vertical” or “horizontal”). The default value is “vertical”.

Returns:

None

class bioat.lib.libcircos.Tarc(arc_id=None, tree=None, format='newick', interspace=3, raxis_range=(900, 950), facecolor=None, edgecolor='#303030', linewidth=0, label=None, labelposition=0, labelsize=10, label_visible=False)[source]

Bases: Garc

__init__(arc_id=None, tree=None, format='newick', interspace=3, raxis_range=(900, 950), facecolor=None, edgecolor='#303030', linewidth=0, label=None, labelposition=0, labelsize=10, label_visible=False)[source]

Initializes a Garc class object.

Parameters:
  • arc_id (str, optional) – Unique identifier for the Garc class object. If not provided, an original unique ID is automatically generated. Default is None.

  • tree (str) – File name of phylogenetic tree.

  • format (str) – Format of the phylogenetic tree. Default is “newick”.

  • interspace (float, optional) – Distance angle (degrees) to the adjacent arc section in clockwise sequence. The actual interspace size is determined by the ratio of size to the combined sum of the size and interspace values of the Garc class objects in the Gcircle class object. Default is 3.

  • raxis_range (tuple, optional) – Radial axis range where the line plot is drawn. Default is (900, 950).

  • facecolor (str or tuple, optional) – Color for filling. Default is None.

  • edgecolor (str or tuple, optional) – Edge color of the filled area. Default is “#303030”.

  • linewidth (float, optional) – Edge line width. Default is 0.

  • label (str, optional) – Label of the arc section. Default is None.

  • labelposition (int, optional) – Relative label height from the center of the arc section. Default is 0.

  • labelsize (int, optional) – Font size of the label. Default is 10.

  • label_visible (bool, optional) – If True, the label of the Garc object is shown on the arc section. Default is False.

Returns:

None

class bioat.lib.libcircos.Tcircle(fig=None, figsize=None)[source]

Bases: Gcircle

Tcircle class is the subclass of Gcircle. All methods implemented in the Gcircle class also can be used. Then, the two additional methods set_tarc, plot_tree and plot_highlight is provided in the Tcircle class.

__getattr__(name)[source]

Retrieves attributes of the object.

Parameters:

name (str) – The name of the attribute to retrieve.

Returns:

The corresponding _garc_dict if name is ‘tarc_dict’.

Return type:

dict

__init__(fig=None, figsize=None)[source]

Initializes the circular mapping object.

Parameters:
  • fig (matplotlib.pyplot.figure, optional) – Matplotlib Figure class object to be used.

  • figsize (tuple, optional) – Size of the figure for the circular map.

add_tarc(tarc)[source]

Adds a new Tarc or Garc object to the tarc_dict.

Parameters:

tarc (Tarc or Garc) – The Tarc or Garc object to be added.

Returns:

None

plot_highlight(tarc_id, highlight_dict=None)[source]

Add highlight for a specific clade under the given internal clade.

Parameters:
  • tarc_id (str) – ID of the Tarc class object. The ID should be in Tcircle object’s tarc_dict.

  • highlight_dict (dict, optional) –

    A dictionary composed of pairs of internal clade name and a sub-dict. Instead of clade name, tuples of terminal clade names can also be used. A sub-dict is composed of the following key-value pairs:

    • color (str):

      Color of the highlight for clades. The default is “#000000”.

    • alpha (float):

      Alpha of the highlight for clades. The default is 0.25.

    • label (str, optional):

      Label for the highlight. The default is None.

    • fontsize (float):

      Font size of the label. The default is 10.

    • y (float):

      Y location of the text. The default is at the bottom edge of the highlight.

Returns:

None

plot_tree(tarc_id, rlim=(0, 700), cladevisual_dict=None, highlight_dict=None, linecolor='#303030', linewidth=0.5)[source]

Draw a circular phylogenetic tree.

Parameters:
  • tarc_id (str) – The ID of the Tarc class object. The ID should be in Tcircle object.tarc_dict.

  • rlim (tuple of int, optional) – The top and bottom radial limits in data coordinates. Defaults to (0, 700).

  • cladevisual_dict (dict, optional) –

    A dictionary containing clade visualization parameters, structured as pairs of clade name and a sub-dictionary with the following keys:

    • size (float): Size of the dot. Defaults to 5.

    • color (str or float): Face color of the dot. Defaults to “#303030”.

    • edgecolor (str or float): Edge line color of the dot. Defaults to “#303030”.

    • linewidth (float): Edge line width of the dot. Defaults to 0.5.

  • highlight_dict (dict, optional) –

    A dictionary containing clade highlight parameters, which can also use tuples of terminal clade names. The structure includes:

    • color (str): Color for highlighting clades. Defaults to “#000000”.

    • alpha (float): Transparency level for highlights. Defaults to 0.25.

    • label (str): Label for the highlight. Defaults to None.

    • fontsize (float): Font size of the label. Defaults to 10.

    • y (float): Y location of the text. Defaults to the bottom edge of the highlight.

  • linecolor (str or tuple, optional) – Color of the tree line. Defaults to “#303030”.

  • linewidth (float) – Line width of the tree. Defaults to 0.5.

Returns:

None

set_tarcs(start=0, end=360)[source]

Visualizes the arc rectangles of Tarc objects in the .garc_dict.

After execution, no new Tarc objects can be added to garc_dict, and a matplotlib.pyplot.figure will be created in the Tcircle object.

Parameters:
  • start (int, optional) – Start angle of the circos plot, in the range of -360 to 360. Defaults to 0.

  • end (int, optional) – End angle of the circos plot, in the range of -360 to 360. Defaults to 360.

Returns:

None

bioat.lib.libcircos.table_hg38_chromosome_length()[source]
bioat.lib.libcircos.table_hg38_cytoband()[source]
bioat.lib.libcircos.table_mm10_chromosome_length()[source]
bioat.lib.libcircos.table_mm10_cytoband()[source]

bioat.lib.libcolor module

_summary_.

author: Herman Huanan Zhao email: hermanzhaozzzz@gmail.com homepage: https://github.com/hermanzhaozzzz

_description_

example 1:
bioat list
<in shell>:

$ bioat list

<in python consolo>:
>>> from bioat.cli import Cli
>>> bioat = Cli()
>>> bioat.list()
>>> print(bioat.list())
example 2:

_example_

bioat.lib.libcolor.convert_hex_to_rgb(hex_color: str) tuple[source]

Convert HEX color to RGB color.

Parameters:

hex_color – str, like ‘#FFFFAA’

Returns:

tuple, like (255, 255, 170)

bioat.lib.libcolor.convert_rgb_to_hex(rgb_color: tuple) str[source]

Convert RGB color to HEX color.

Parameters:

rgb_color – tuple, like (255, 255, 170)

Returns:

str, like ‘#FFFFAA’

bioat.lib.libcolor.make_color_list(low_color_RGB, high_color_RGB, length_out=20, return_fmt='HEX', log_level='DEBUG')[source]
INPUT
<low_color_RGB> <high_color_RGB>

Format like (210, 179, 150), tuple, list, or np.array.

<back_format>

Hex OR RGB

Returns:

<color_list>

bioat.lib.libcolor.map_color(value_vec, breaks, color_list)[source]
INPUT:
<value_vec>

np.array or a list of values.

<breaks>

A sorted value list, which can split all num into len(color_list) intervals. e.g. [0.01, 0.1, 0.5, 1] make all real num into 5 intervals, (-Inf,0.01], (0.01,0.1], (0.1, 0.5], (0.5, 1], (1, +Inf]

<color_list>

A hex-format color list, which have to match with breaks

Returns:

<value_color_vec>

A list map the value_vec with breaks

bioat.lib.libcolor.map_colors_between_two(base_color, target_color, values)[source]

在两个指定颜色之间根据数值映射生成颜色。.

Parameters:
  • base_color (str) – 起始颜色,格式为 ‘#RRGGBB’。

  • target_color (str) – 结束颜色,格式为 ‘#RRGGBB’。

  • values (list or array) – 数值数组,范围可为任意值。

Returns:

对应颜色数组(#RRGGBB 格式)。

Return type:

list

bioat.lib.libcolor.plot_colortable(colors, *, ncols=4, sort_colors=True, labels=None)[source]

bioat.lib.libcrispr module

bioat.lib.libcrispr.cmp_align_list(aln_a: dict, aln_b: dict)[source]

Compare function for alignment lists.

This function processes alignment data and performs comparisons based on the input structure.

Parameters:

input_data (list | dict) – The alignment data to be processed. Can be provided as: - A list, e.g.: [17, 3, 0, 73.0, 1, 22, ‘AAGAAGAAGACGAGTCTGCA’, ‘||||||||||||||X|||XX’, ‘AAGAAGAAGACGAGCCTGAG’] - A dictionary, e.g.: {‘match_count’: 26, ‘mismatch_count’: 3, ‘gap_count’: 4, ‘aln_score’: 96.0, ‘ref_aln_start’: 0, ‘ref_aln_end’: 32, ‘alignment’: {‘reference_seq’: ‘GGCACTGCGGCTGGAAAAAAAAAAAAAAA–GT’,’aln_info’: ‘–..|.|||||||||||||||||||||||–||’, ‘target_seq’: ‘–GGCAGCGGCTGGAAAAAAAAAAAAAAAAGGT’}}

Returns:

The function does not return a value but performs comparisons.

Return type:

None

Example

>>> compare_alignments(
...     [
...         17,
...         3,
...         0,
...         73.0,
...         1,
...         22,
...         "AAGAAGAAGACGAGTCTGCA",
...         "||||||||||||||X|||XX",
...         "AAGAAGAAGACGAGCCTGAG",
...     ]
... )
bioat.lib.libcrispr.run_target_seq_align(ref_seq: Seq, target_seq: Seq, aligner: PairwiseAligner, PAM: dict | None = None, log_level='WARNING') list[source]

Perform global alignment for the target sequence.

This function performs global alignment between a reference sequence and a target sequence (usually the sgRNA sequence without PAM) using the specified aligner.

Parameters:
  • ref_seq (Seq) – A Seq object from BioPython representing the reference sequence for the targeted deep sequencing.

  • target_seq (Seq) – A Seq object from BioPython representing the target region sequence for the editing window (usually the sgRNA sequence without PAM).

  • aligner (object) – A PairwiseAligner object from BioPython used for performing the alignment.

  • PAM (dict, optional) – A dictionary containing PAM information with the following keys: - ‘PAM’ (str): The PAM sequence (e.g., “AGG”). - ‘position’ (int): The insertion site of target_seq. - ‘weight’ (float): The priority weight for PAM alignment. The alignment score will be multiplied by this weight. - Example: {‘PAM’: ‘AGG’, ‘position’: 20, ‘weight’: 1.0}.

Returns:

A list of dictionaries, each containing alignment results. Each dictionary has the following keys: - ‘match_count’ (int): The number of matching bases in the alignment. - ‘mismatch_count’ (int): The number of mismatched bases. - ‘gap_count’ (int): The number of gaps. - ‘aln_score’ (float): The alignment score. - ‘ref_aln_start’ (int): The starting position of the alignment on the reference sequence. - ‘ref_aln_end’ (int): The ending position of the alignment on the reference sequence. - ‘alignment’ (dict): The alignment details containing:’reference_seq’ (str): The aligned reference sequence.’aln_info’ (str): The alignment information (e.g., matches, mismatches, and gaps).’target_seq’ (str): The aligned target sequence.

Return type:

list[dict]

Example

An example alignment result might look like this:[{‘match_count’: 26,’mismatch_count’: 3,’gap_count’: 4,’aln_score’: 96.0,’ref_aln_start’: 0,’ref_aln_end’: 32,’alignment’: {‘reference_seq’: ‘GGCACTGCGGCTGGAAAAAAAAAAAAAAA–GT’,’aln_info’: ‘–..|.|||||||||||||||||||||||–||’,’target_seq’: ‘–GGCAGCGGCTGGAAAAAAAAAAAAAAAAGGT’}}]

bioat.lib.libdataclasses module

class bioat.lib.libdataclasses.Assembly(path: str)[source]

Bases: object

contigs: list[Fasta]
length: int
path: str
class bioat.lib.libdataclasses.Bam[source]

Bases: object

class bioat.lib.libdataclasses.Bed(chromosome: str, start: int, end: int, name: str, score: int, strand: str)[source]

Bases: object

chromosome: str
end: int
name: str
score: int
start: int
strand: str
class bioat.lib.libdataclasses.Fasta(header: str, sequence: str)[source]

Bases: object

usage: fa = Fasta(‘this_header’, ‘AGCTACGTCCCCTGA’) fa print(fa).

header: str
property length
sequence: str
class bioat.lib.libdataclasses.Fastq(header: str, sequence: str, extras: str, quality: str)[source]

Bases: Fasta

extras: str
header: str
is_valid()[source]
quality: str
sequence: str
class bioat.lib.libdataclasses.VCF[source]

Bases: object

bioat.lib.libdetect_seq module

bioat.lib.libfastx module

bioat.lib.libfastx.calculate_length_distribution(file, table=None, image=None, plt_show=False, log_level='WARNING')[source]
bioat.lib.libfastx.cas13_finder(input_faa: str, output_faa: str | None = None, lmin: int | None = None, lmax: int | None = None, log_level='DEBUG') None[source]
bioat.lib.libfastx.cas_finder(input_fa: str, output_faa: str | None = None, output_contig_fa: str | None = None, output_crispr_info_tab: str | None = None, lmin: int | None = None, lmax: int | None = None, extend: int = 10000, temp_dir: str | None = None, prodigal: str | None = None, prodigal_mode: str = 'meta', pilercr: str | None = None, rm_temp: bool = True, log_level='INFO') None[source]
bioat.lib.libfastx.format_this_fastx(old_file: str, new_file: str | None = None, force: bool = False, log_level: str = 'DEBUG')[source]

bioat.lib.libjgi module

class bioat.lib.libjgi.JGIConfig(overwrite_conf: bool = False)[source]

Bases: object

FILENAME_CONFIG_PATH = '/home/docs/.bioat/JGI/account.conf'
FILENAME_TEMPLATE_LOG_FAIL = 'jgi-xml-query.failed_{}.log'
FILENAME_TEMPLATE_XML = 'jgi-xml-query.result_{}.xml'
URL_JGI_FETCH_XML = 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory'
URL_JGI_LOGIN = 'https://signon.jgi.doe.gov/signon/create'
URL_JGI_MAIN = 'https://genome.jgi.doe.gov'
input_user_info()[source]

Dialog with user to gather user information for use with the curl query. Returns a dict.

load_config()[source]

Reads “user”, “password” and “categories” entries from config file.

save_config()[source]

Creates a config file <config_path> using credentials from dict <config_info>.

class bioat.lib.libjgi.JGIDoc[source]

Bases: object

DEFAULT_CATEGORIES = ['ESTs', 'EST Clusters', 'Assembled scaffolds (unmasked)', 'Assembled scaffolds (masked)', 'Transcripts', 'Genes', 'CDS', 'Proteins', 'Additional Files']
select_blurb = "        # SYNTAX ///////////////////////////////////////////////////////////////////////\n        Using the following format syntax to download selected file:\n            <category number>:<i>[,<i>, <i>];<category number>:<i>-<i>;...\n\n        Indices (<i>) may be a mixture of comma-separated values and hyphen-separated\n        ranges.\n\n        Example:\n            '3:4,5; 7:1-10,13' will select elements 4 and 5 from category 3, and 1-10\n            plus 13 from category 7.\n        # /SYNTAX ///////////////////////////////////////////////////////////////////////\n        "
usage_example_blurb = '        This tool will retrieve files from JGI\n        It will return a list of possible files for downloading.\n\n        To get <jgi_address>, go to: http://genome.jgi.doe.gov/ and search for your\n        species of interest. Click through until you are at the "Info" page. For\n        \x1b[3mNematostella vectensis\x1b[23m, the appropriate page is\n        "http://genome.jgi.doe.gov/Nemve1/Nemve1.info.html".\n\n        To query using only the name simply requires the specific JGI organism\n        abbreviation, as referenced in the full url.\n\n        For the above example, the proper input syntax for this script would be:\n\n        $ bioat meta JGI_query -q http://genome.jgi.doe.gov/Nemve1/Nemve1.info.html\n\n                                 -or-\n\n        $ bioat meta JGI_query -q Nemve1\n\n        If you already have the XML file for the query in the directory, you may use\n        the --xml flag to avoid redownloading it (particularly useful if querying\n        large, top-level groups with many sub-species, such as "fungi"):\n\n        $ bioat meta JGI_query -x <your_xml_index>\n\n        If the XML filename is omitted when using the -x/--xml flag, it is assumed that\n        the XML file is named "jgi-xml-query.result_<organism-name>.xml". In such cases, the\n        organism name is required.\n        '
class bioat.lib.libjgi.JGIOperator(query_info: str | None = None, xml: str | None = None, log_fails: str | None = None, nretry: int = 4, timeout: int = -1, regex: str | None = None, all_get: bool = False, overwrite_conf: bool = False, filter_files: bool = False, proxy_pool: str | None = None, just_query_xml: bool = False, syntax_help: bool = False, usage: bool = False, log_level: str = 'INFO')[source]

Bases: object

download()[source]
parse_xml()[source]

Moves through the xml document <xml_file> and returns information about matches to elements in <DESIRED_CATEGORIES> if <filter_categories> is True, or all files otherwise.

query()[source]

bioat.lib.libpandas module

_summary_.

author: Herman Huanan Zhao email: hermanzhaozzzz@gmail.com homepage: https://github.com/hermanzhaozzzz

_description_

example 1:
bioat list
<in shell>:

$ bioat list

<in python consolo>:
>>> from bioat.cli import Cli
>>> bioat = Cli()
>>> bioat.list()
>>> print(bioat.list())
example 2:

_example_

bioat.lib.libpandas.set_option(progres_bar: bool = True, max_colwidth: int = 40, display_width: int = 120, display_max_columns: int | None = None, display_max_rows: int = 50, log_level='INFO')[source]

bioat.lib.libpatentseq module

_summary_.

author: Herman Huanan Zhao email: hermanzhaozzzz@gmail.com homepage: https://github.com/hermanzhaozzzz

_description_

example 1:
bioat list
<in shell>:

$ bioat list

<in python consolo>:
>>> from bioat.cli import Cli
>>> bioat = Cli()
>>> bioat.list()
>>> print(bioat.list())
example 2:

_example_

bioat.lib.libpatentseq.run(playwright, username, password, seq, seq_header, proxy_server, output, headless, nretry, local_browser, rm_fail_cookie, log_level) None[source]

bioat.lib.libpath module

bioat.lib.libpath.check_cmd(x, log_level='WARNING') bool[source]

Check if a command is available in the system’s PATH.

Parameters:

x (str) – The command name to check.

Returns:

True if the command is executable and found in PATH, False otherwise.

Return type:

bool

bioat.lib.libpath.check_executable(x: str | None, name: str | None, log_level: str = 'WARNING') None[source]

bioat.lib.libpdb module

TODO.

bioat.lib.libpdb.get_cut2ref_aln_info(ref: str | Path | Structure, cut: str | Path | Structure, cal_rmsd=True, cal_tmscore=False, label1='ref', label2='cut', usalign_bin: str | Path = 'usalign', log_level='WARNING') dict[source]

Align cutted pdb to ref pdb using the CA atoms.

Aligns a truncated protein structure (cut) to its full-length reference structure (ref) using Ca atoms and Biopython’s Superimposer.

This function: - Extracts all Ca atoms from ref and cut - Removes atoms from ref at the indices listed in gap_indices - Aligns the remaining atoms from cut to the corresponding positions in ref - Modifies the cut structure in-place to match the aligned orientation - Returns both structures and the RMSD value of the alignment

It assumes: - One-to-one correspondence between residues after gap removal - Structures are predicted by AlphaFold2 / ESMFold (no missing atoms)

Parameters:
  • ref (str or Bio.PDB.Structure.Structure) – Reference structure path or loaded Structure.

  • cut (str or Bio.PDB.Structure.Structure) – Truncated structure path or loaded Structure.

  • cal_rmsd (bool, optional) – Whether to calculate RMSD. Default is True.

  • cal_tmscore (bool, optional) – Whether to calculate TM-score using USalign. Default is False.

  • label1 (str, optional) – Name for the reference structure. Default is “ref”.

  • label2 (str, optional) – Name for the cut structure. Default is “cut”.

  • usalign_bin (str or Path, optional) – Path to the USalign binary for TM-score calculation. Default is “usalign”.

  • log_level (str, optional) – Logging level. Default is “WARNING”.

Returns:

{

“{label1}”: aln label1 structure, # if cal_rmsd is True, unaltered label1 structure “{label2}}”: fixed label2 structure, # if cal_rmsd is True, fix label2 coords in-place “RMSD”: 0.123 # if cal_rmsd is True, the RMSD value between label1 and label2 f”{label1}_seq”: ref_seq, # if cal_rmsd is True, the sequence of label1 structure f”{label2}_seq”: cut_seq, # if cal_rmsd is True, the sequence of label2 structure “alignment_dict”: alignment_dict, # if cal_rmsd is True, the alignment dict of label1 and label2 “gap_indices”: gap_indices, # if cal_rmsd is True, the indices of gaps in label1 structure “TM-score:mean”: 0.623, # if cal_tmscore is True, the mean TM-score value “TM-score:TM1”: 0.456, # if cal_tmscore is True, use label1 as ref <L_N> in calculation “TM-score:TM2”: 0.789, # if cal_tmscore is True, use label2 as ref <L_N> in calculation …

}

Return type:

dict

bioat.lib.libpdb.load_structure(structure: str | Path | Structure, label: str | None = None) tuple[Structure, str][source]

Load a PDB structure file or a BiopythonStructure object.

Parameters:
  • structure (str | Path | BiopythonStructure) – Structure file path or BiopythonStructure object.

  • label (str | None, optional) – Structure label. If None, the label will be set to the file name. Defaults to None.

Raises:

BioatInvalidParameterError – If the input structure is not a valid file path or a BiopythonStructure object.

Returns:

A tuple of BiopythonStructure object and label.

Return type:

tuple[BiopythonStructure, str]

bioat.lib.libpdb.show_ref_cut(ref_seq: str | Path | Seq, ref_pdb: str | Path | Structure, cut_seq: list[str | Path | Seq] | str | Path | Seq | None = None, cut_pdb: list[str | Path | Structure] | str | Path | Structure | None = None, cut_labels: list[str] | str | None = None, ref_color: str = 'red', ref_map_colors: tuple[str, str] | None = None, ref_map_values: dict | None = None, cut_color='lightgray', gap_color='purple', ref_style='cartoon', cut_style='cartoon', gap_style='cartoon', ref_map_value_random: bool = False, output_fig: str | Path | None = None, col: int = 4, scale: float = 1.0, annotate: bool = True, text_interval: int = 5, log_level='WARNING')[source]

Visualizes the alignment of sequences and highlights changes in PDB structures using py3Dmol.

Parameters:
  • ref_seq (str or Seq) – Amino acid sequence content for the ref protein.

  • ref_pdb (str or BiopythonStructure) – Path to the PDB file of the reference structure.

  • cut_seq (str, Seq or None, optional) – Amino acid sequence content for the cut protein.

  • cut_pdb (str, BiopythonStructure or None, optional) – Path to the PDB file of the cut structure.

  • cut_labels (list[str] or str or None, optional) – Label for the cut proteins. If None, the label will be set to “cut”.

  • ref_color (str, optional) – Color for reference residues.

  • ref_map_colors (tuple[str, str] or None, optional) – ref_map_colors will be used as color bar from ref_map_colors[0] to ref_map_colors[1]. If None, do not apply color mapping. Defaults to None.

  • ref_map_values (dict or None, optional) – A dictionary of values for the ref color map, it will be normalized to the range of [0 - 1]. If None, all residues will be colored with the same color. e.g. ref_map_values = {‘V_0’: 0.4177215189873418, ‘S_1’: 0.8185654008438819, ‘K_2’: 0.9915611814345991, ‘G_3’: 0.42616033755274263, …}

  • cut_color (str, optional) – Color for cut residues.

  • gap_color (str, optional) – Color for gaps or removed residues.

  • ref_style (str, optional) – “stick”, “sphere”, “cartoon”, or “line”

  • cut_style (str, optional) – “stick”, “sphere”, “cartoon”, or “line”

  • gap_style (str, optional) – “stick”, “sphere”, “cartoon”, or “line”

  • ref_map_value_random (bool, optional) – If True, ref_map_values will be randomly generated. Defaults to False.

  • output_fig (str or Path or None, optional) – Output figure file path. If None, the figure will not be saved in html format. Defaults to None.

  • col (int, optional) – Number of columns for the visualization. Defaults to 3.

  • scale (float, optional) – Scale factor for the visualization. Defaults to 1.0.

  • annotate (bool, optional) – Whether to annotate the visualization with labels. Defaults to True.

  • text_interval (int, optional) – The interval between text annotations. Defaults to 5.

  • log_level (str, optional) – Log level. Defaults to “WARNING”.

bioat.lib.libpdb.structure2string(structure: Structure) str[source]

Convert BiopythonStructure object to string format.

Parameters:

structure (BiopythonStructure) – Structure object to convert.

Returns:

PDB context.

Return type:

str

bioat.lib.libphylo module

bioat.lib.libplot module

bioat.lib.libplot.

author: Herman Huanan Zhao email: hermanzhaozzzz@gmail.com homepage: https://github.com/hermanzhaozzzz

This module provides functions for plotting.

example 1:
init_matplotlib
<in python consolo>:
>>> import matplotlib.pyplot as plt
>>> from bioat.lib.libplot import init_matplotlib
>>> init_matplotlib(log_level="info")
>>> plt.plot([1, 2, 3], [4, 5, 6])
>>> plt.show()
example 2:
plot_colortable
<in python consolo>:
>>> from bioat.lib.libplot import plot_colortable
>>> colors = ["#64C1E8", "#80CED7", "#63C7B2", "#8E6C88", "#CA61C3", "#FF958C", "#883677"]
>>> plot_colortable(colors, ncols=1, labels=[1, 2, 3, 4, 5, 6, 7])
>>> plt.show()
bioat.lib.libplot.init_matplotlib(font='Helvetica', refresh=False, sns_context='paper', sns_style='white', sns_palette=('#8FA58E', '#8BA4B8', '#C8B6A6', '#B4867A', '#A792A6', '#D0C2A4', '#A7B6A3', '#7D8C99'), sns_font_scale=1.0, figure_dpi=300, log_level='INFO', **kwargs)[source]

Easily set matplotlib style.

Parameters:
  • font (str, optional) – use what font in matplotlib, defaults to ‘Helvetica’

  • refresh (bool, optional) – whether to remove matplotlib font cache and reload bundled fonts, defaults to False

  • sns_context (str, optional) – seaborn context, defaults to ‘paper’

  • sns_style (str, optional) – seaborn style, defaults to ‘white’

  • sns_palette (str or sequence, optional) – seaborn palette, defaults to a muted Morandi palette

  • sns_font_scale (float, optional) – seaborn font scale, defaults to 1.0

  • figure_dpi (int, optional) – default figure resolution, defaults to 300

  • log_level (str, optional) – log level, defaults to ‘INFO’

Raises:
bioat.lib.libplot.plot_colortable(colors, *, ncols=4)[source]

bioat.lib.libsnakemake module

bioat.lib.libsnakemake.check_cmd(x)[source]
bioat.lib.libsnakemake.check_read(x)[source]
bioat.lib.libsnakemake.print_head(SAMPLES, MODE)[source]

bioat.lib.libspider module

class bioat.lib.libspider.ProxyPool(url)[source]

Bases: object

delete_proxy(proxy)[source]
get_proxy()[source]
bioat.lib.libspider.get_random_user_agents()[source]

Module contents