Welcome to AbNumber’s documentation!¶

Antibody chain representation, alignment and numbering using ANARCI

Getting Started¶

Install AbNumber using Bioconda:

conda install -c bioconda abnumber

Credits¶

This tool is based on ANARCI, please cite the ANARCI paper: ANARCI: antigen receptor numbering and receptor classification

Examples¶

See the Example Jupyter Notebook for usage examples.

Chain¶

class abnumber.Chain(sequence, scheme, cdr_definition=None, name=None, assign_germline=False, allowed_species=None, **kwargs)¶

Antibody chain aligned to a chosen antibody numbering scheme

Example:

>>> from abnumber import Chain
>>>
>>> seq = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAPSVYPLA'
>>> chain = Chain(seq, scheme='imgt')
>>> chain
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^

Chain can be iterated:

>>> for pos, aa in chain:
>>>     print(pos, aa)
H1  Q
H2  V
H3  Q
H4  L
H5  Q
...

Chain can also be indexed and sliced using scheme numbering:

>>> chain['5']
'Q'
>>> for pos, aa in chain['H2':'H5']:
>>>     print(pos, aa)
H2  V
H3  Q
H4  L
H5  Q

Parameters:

sequence – Unaligned string sequence
name – Optional sequence identifier
scheme – Numbering scheme: One of imgt, chothia, kabat, aho
cdr_definition – Numbering scheme to be used for definition of CDR regions. Same as scheme by default. One of imgt, chothia, kabat, north. Required for aho.
assign_germline – Assign germline name using ANARCI based on best sequence identity
allowed_species – Allowed species for germline assignment. Use None to allow all species, or one or more of: 'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'
aa_dict – (Internal use only) Create Chain object directly from dictionary of region objects (internal use)
tail – (Internal use only) Constant region sequence
species – (Internal use only) Species as identified by ANARCI
germline – (Internal use only) Germline as identified by ANARCI

align(*other) → Alignment¶

Align this chain to other chains by using their existing numbering

>>> from abnumber import Chain
>>>
>>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP'
>>> chain1 = Chain(seq1, scheme='imgt')
>>>
>>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP'
>>> chain2 = Chain(seq2, scheme='imgt')
>>>
>>> alignment = chain1.align(chain2)
>>> print(alignment.format())
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.|||||||||||
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^^                                      ^^^^^^^^^^^^

Parameters:: other – The Chain object to align, can be repeated to create a multiple sequence alignment
Returns:: Alignment object

property cdr1_seq¶: Unaligned string representation of the CDR 1 region sequence

property cdr2_seq¶: Unaligned string representation of the CDR 2 region sequence

property cdr3_seq¶: Unaligned string representation of the CDR 3 region sequence

cdr_definition: str¶: Numbering scheme to be used for definition of CDR regions (same as scheme by default)

chain_type: str¶

Chain type as identified by ANARCI: H (heavy), K (kappa light) or L (lambda light)

clone(replace_seq: Optional[str] = None)¶

Create a copy of this chain, optionally with a replacement sequence

Parameters:: replace_seq – Optional replacement sequence, needs to be the same length
Returns:: new Chain object

find_human_germlines(limit=10, v_gene=None, j_gene=None, unique=True) → Tuple[List[Chain], List[Chain]]¶

Find most identical V and J germline sequences based on IMGT alignment

Parameters:

limit – Number of best matching germlines to return
v_gene – Filter germlines to specific V gene name
j_gene – Filter germlines to specific J gene name
unique – Skip germlines with duplicate amino acid sequence

Returns:

list of top V chains, list of top J chains

find_merged_human_germline(top=0, v_gene=None, j_gene=None) → Chain¶

Find n-th most identical V and J germline sequence based on IMGT alignment and merge them into one Chain

Parameters:

top – Return top N most identical germline (0-indexed)
v_gene – Filter germlines to specific V gene name
j_gene – Filter germlines to specific J gene name

Returns:

merged germline sequence Chain object

format(method='wide', **kwargs)¶

Format sequence to string

Parameters:: method – use "wide" for Chain.format_wide() or "tall" for Chain.format_tall()
Returns:: formatted string

format_tall(columns=5)¶

Create string with one position per line, showing position numbers and amino acids

Returns:: formatted string

format_wide(numbering=False)¶

Create string with sequence on first line and CDR regions higlighted with ^ on second line

Parameters:: numbering – Add position numbers on top
Returns:: formatted string

property fr1_seq¶: Unaligned string representation of the Framework 1 region sequence

property fr2_seq¶: Unaligned string representation of the Framework 2 region sequence

property fr3_seq¶: Unaligned string representation of the Framework 3 region sequence

property fr4_seq¶: Unaligned string representation of the Framework 4 region sequence

classmethod from_fasta(path_or_handle, scheme, cdr_definition=None, as_series=False, as_generator=False, **kwargs) → Union[List[Chain], Series, Generator[Chain, None, None]]¶: Read multiple chains from FASTA

get_position_by_raw_index(index)¶: Get Position object at corresponding raw numeric position

graft_cdrs_onto(other: Chain, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [], name: Optional[str] = None) → Chain¶

Graft CDRs from this Chain onto another chain

Parameters:

other – Chain to graft CDRs into (source of frameworks and tail sequence)
backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)
backmutations – List of positions that should additionally be grafted from this chain (str or or Position)
name – Name of new Chain. If not provided, use name of this chain.

Returns:

Chain with CDRs grafted from this chain and frameworks from the given chain

graft_cdrs_onto_human_germline(v_gene=None, j_gene=None, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [])¶

Graft CDRs from this Chain onto the nearest human germline sequence

Parameters:

v_gene – Use defined V germline allele (e.g. IGHV1-18*01), gene (e.g. IGHV1-18) or family (e.g. IGHV1)
j_gene – Use defined J germline allele (e.g. IGHJ1*01) or gene (e.g. IGHJ1)
backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)
backmutations – List of positions that should additionally be grafted from this chain (str or or Position)

Returns:

Chain with CDRs grafted from this chain and frameworks from TODO

is_heavy_chain()¶: Check if this chain is heavy chain (chain_type=="H")

is_kappa_light_chain()¶: Check if this chain is kappa light chain (chain_type=="K")

is_lambda_light_chain()¶: Check if this chain is lambda light chain (chain_type=="L")

is_light_chain()¶: Check if this chain is light chain (chain_type=="K" or chain_type=="L")

j_gene: str¶: J gene germline as identified by ANARCI (if assign_germline is True)

name: str¶: User-provided sequence identifier

property positions¶: Dictionary of Position -> Amino acid

print(method='wide', **kwargs)¶

Print string representation using Chain.format()

By default, produces “wide” format with sequence on first line and CDR regions higlighted with ^ on second line:

>>> chain.print()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^

Parameters:: method – use "wide" for Chain.format_wide() or "tall" for Chain.format_tall()

print_tall(columns=5)¶

Print string representation using Chain.format_tall()

>>> chain.print_tall()
FR1 H1    Q
FR1 H2    V
FR1 H3    Q
FR1 H4    L
FR1 H5    Q
FR1 H6    Q
FR1 H7    S
...

print_wide(numbering=False)¶

Print string representation using Chain.format_wide()

>>> chain.print_wide()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^

property raw¶

Access raw representation of this chain to allow unaligned numeric indexing and slicing

>>> # String numbering is based on schema numbering
>>> chain['1']
'QVQLQQSGAE'
>>> # Numbering of ``chain.raw`` starts at 0
>>> chain.raw[0]
'QVQLQQSGAE'
>>> # Slicing with string is based on schema numbering, the end is inclusive
>>> chain['1':'10']
'QVQLQQSGAE'
>>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style)
>>> chain.raw[0:10]
'QVQLQQSGAE'

Returns:: Raw chain accessor that can be sliced or indexed to produce a new Chain object

property regions¶

Dictionary of region dictionaries

Region is an uppercase string, one of: "FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"

Returns:: Dictionary of Region name -> Dictionary of (Position -> Amino acid)

renumber(scheme=None, cdr_definition=None, allowed_species=None)¶

Return copy of this chain aligned using a different numbering scheme or CDR definition

Parameters:

scheme – Change numbering scheme: One of imgt, chothia, kabat, aho.
cdr_definition – Change CDR definition scheme: One of imgt, chothia, kabat, north.
allowed_species – None to allow all species, or one or more of: 'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'

scheme: str¶: Numbering scheme used to align the sequence

property seq¶

Unaligned string representation of the variable chain sequence

Returns:: Unaligned string representation of the variable chain sequence

slice(replace_seq: Optional[str] = None, start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)¶

Create a slice of this chain, optionally with a replacement sequence that is placed into the same numbering

You can also slice directly using chain['111':'112A'] or chain.raw[10:20].

Parameters:

replace_seq – Optional replacement sequence, needs to be the same length
start – Optional slice start position (inclusive), Position or string (e.g. ‘111A’)
stop – Optional slice stop position (inclusive), Position or string (e.g. ‘112A’)
stop_inclusive – Include stop position in slice
allow_raw – Allow unaligned numeric indexing from 0 to length of sequence - 1

Returns:

new Chain object

species: str¶: Species as identified by ANARCI

tail: str¶: Constant region sequence

classmethod to_anarci_csv(chains: List[Chain], path)¶: Save multiple chains to ANARCI-like CSV

classmethod to_dataframe(chains: List[Chain])¶

Produce a Pandas dataframe with aligned chain sequences in the columns

Note: Contains only positions (columns) that are present in the provided chains, so number of columns can differ based on the input.

classmethod to_fasta(chains, path_or_fd, keep_tail=False, description='')¶: Save multiple chains to FASTA

to_seq_record(keep_tail=False, description='')¶: Create BioPython SeqRecord object from this Chain

v_gene: str¶: V gene germline as identified by ANARCI (if assign_germline is True)

Alignment¶

See the Example Jupyter Notebook for usage examples.

class abnumber.Alignment(positions, residues, scheme, chain_type)¶

Antibody chain alignment of two or more chains

>>> from abnumber import Chain
>>>
>>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP'
>>> chain1 = Chain(seq1, scheme='imgt')
>>>
>>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP'
>>> chain2 = Chain(seq2, scheme='imgt')
>>> alignment = chain1.align(chain2)

Alignment can be sliced and iterated:

>>> for pos, (aa, bb) in alignment[:'5']:
>>>     print(pos, aa, bb)
H1  Q Q
H2  V V
H3  Q Q
H4  L L
H5  Q V
...

format(mark_identity=True, mark_cdrs=True)¶

Format alignment to string

Parameters:

mark_identity – Add BLAST style middle line showing identity (|), similar residue (+) or different residue (.)
mark_cdrs – Add line highlighting CDR regions using ^

Returns:

formatted string

has_mutation()¶: Check if there is a mutation in the alignment or not

num_identical()¶: Get number of positions with identical residues

num_mutations()¶: Get number of mutations (positions with more than one type of residue)

num_similar()¶: Get number of positions with similar residues based on BLOSUM62

print(mark_identity=True, mark_cdrs=True)¶

Print string representation of alignment created using Alignment.format()

>>> alignment.print()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.|||||||||||
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^^                                      ^^^^^^^^^^^^
>>> alignment.print(mark_identity=False, mark_cdrs=False)
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS

Parameters:

mark_identity – Add BLAST style middle line showing identity (|), similar residue (+) or different residue (.)
mark_cdrs – Add line highlighting CDR regions using ^

property raw¶

Access raw representation of this alignment to allow unaligned numeric indexing and slicing

>>> # Numbering of ``chain.raw`` starts at 0
>>> alignment.raw[0]
'H1'
>>> # Slicing with string is based on schema numbering, the end is inclusive
>>> chain['1':'10']
'QVQLQQSGAE'
>>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style)
>>> chain.raw[0:10]
'QVQLQQSGAE'
:return: Raw alignment accessor that can be sliced or indexed to produce a new :class:`Alignment` object

slice(start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)¶

Create a slice of this alignment

You can also slice directly using alignment['111':'112A'] or alignment.raw[10:20].

Parameters:

start – Slice start position (inclusive), Position or string (e.g. ‘111A’)
stop – Slice stop position (inclusive), Position or string (e.g. ‘112A’)
stop_inclusive – Include stop position in slice
allow_raw – Allow unaligned numeric indexing from 0 to length of sequence - 1

Returns:

new sliced Alignment object

Position¶

See the Example Jupyter Notebook for usage examples.

class abnumber.Position(chain_type: str, number: int, letter: str, scheme: str)¶

Numbered position using a given numbering scheme

Used as a key to store Position -> Amino acid information.

Position objects are sortable according to the schema simply using sorted().

format(chain_type=True, region=False, rjust=False, ljust=False, fillchar=' ')¶

Format Position to string

Parameters:

chain_type – Add chain type prefix (H/L)
region – Add region prefix (FR1, CDR1, …)
rjust – Align text to the right
ljust – Align text to the left
fillchar – Characer to use for alignment padding

Returns:

formatted string

classmethod from_string(position, chain_type, scheme)¶

Create Position object from string, e.g. “H5”

Note that Positions parsed from string do not support separate CDR definitions.

get_region()¶

Get string name of this position’s region

Returns:: uppercase string, one of: "FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"

is_in_cdr()¶: Check if given position is found in the CDR regions