Welcome to AbNumber’s documentation!¶
Antibody chain representation, alignment and numbering using ANARCI
Getting Started¶
Install AbNumber using Bioconda:
conda install -c bioconda abnumber
Credits¶
This tool is based on ANARCI, please cite the ANARCI paper: ANARCI: antigen receptor numbering and receptor classification
Examples¶
See the Example Jupyter Notebook for usage examples.
Chain¶
- class abnumber.Chain(sequence, scheme, cdr_definition=None, name=None, assign_germline=False, allowed_species=None, **kwargs)¶
Antibody chain aligned to a chosen antibody numbering scheme
- Example:
>>> from abnumber import Chain >>> >>> seq = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAPSVYPLA' >>> chain = Chain(seq, scheme='imgt') >>> chain QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS ^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^
Chain can be iterated:
>>> for pos, aa in chain: >>> print(pos, aa) H1 Q H2 V H3 Q H4 L H5 Q ...
Chain can also be indexed and sliced using scheme numbering:
>>> chain['5'] 'Q' >>> for pos, aa in chain['H2':'H5']: >>> print(pos, aa) H2 V H3 Q H4 L H5 Q
- Parameters:
sequence – Unaligned string sequence
name – Optional sequence identifier
scheme – Numbering scheme: One of
imgt
,chothia
,kabat
,aho
cdr_definition – Numbering scheme to be used for definition of CDR regions. Same as
scheme
by default. One ofimgt
,chothia
,kabat
,north
. Required foraho
.assign_germline – Assign germline name using ANARCI based on best sequence identity
allowed_species – Allowed species for germline assignment. Use
None
to allow all species, or one or more of:'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'
aa_dict – (Internal use only) Create Chain object directly from dictionary of region objects (internal use)
tail – (Internal use only) Constant region sequence
species – (Internal use only) Species as identified by ANARCI
germline – (Internal use only) Germline as identified by ANARCI
- align(*other) Alignment ¶
Align this chain to other chains by using their existing numbering
>>> from abnumber import Chain >>> >>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP' >>> chain1 = Chain(seq1, scheme='imgt') >>> >>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP' >>> chain2 = Chain(seq2, scheme='imgt') >>> >>> alignment = chain1.align(chain2) >>> print(alignment.format()) QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS ||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.||||||||||| QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS ^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^
- property cdr1_seq¶
Unaligned string representation of the CDR 1 region sequence
- property cdr2_seq¶
Unaligned string representation of the CDR 2 region sequence
- property cdr3_seq¶
Unaligned string representation of the CDR 3 region sequence
- cdr_definition: str¶
Numbering scheme to be used for definition of CDR regions (same as
scheme
by default)
- chain_type: str¶
Chain type as identified by ANARCI:
H
(heavy),K
(kappa light) orL
(lambda light)See also
Chain.is_heavy_chain()
andChain.is_light_chain()
.
- clone(replace_seq: Optional[str] = None)¶
Create a copy of this chain, optionally with a replacement sequence
- Parameters:
replace_seq – Optional replacement sequence, needs to be the same length
- Returns:
new Chain object
- find_human_germlines(limit=10, v_gene=None, j_gene=None, unique=True) Tuple[List[Chain], List[Chain]] ¶
Find most identical V and J germline sequences based on IMGT alignment
- Parameters:
limit – Number of best matching germlines to return
v_gene – Filter germlines to specific V gene name
j_gene – Filter germlines to specific J gene name
unique – Skip germlines with duplicate amino acid sequence
- Returns:
list of top V chains, list of top J chains
- find_merged_human_germline(top=0, v_gene=None, j_gene=None) Chain ¶
Find n-th most identical V and J germline sequence based on IMGT alignment and merge them into one Chain
- Parameters:
top – Return top N most identical germline (0-indexed)
v_gene – Filter germlines to specific V gene name
j_gene – Filter germlines to specific J gene name
- Returns:
merged germline sequence Chain object
- format(method='wide', **kwargs)¶
Format sequence to string
- Parameters:
method – use
"wide"
forChain.format_wide()
or"tall"
forChain.format_tall()
- Returns:
formatted string
- format_tall(columns=5)¶
Create string with one position per line, showing position numbers and amino acids
- Returns:
formatted string
- format_wide(numbering=False)¶
Create string with sequence on first line and CDR regions higlighted with ^ on second line
- Parameters:
numbering – Add position numbers on top
- Returns:
formatted string
- property fr1_seq¶
Unaligned string representation of the Framework 1 region sequence
- property fr2_seq¶
Unaligned string representation of the Framework 2 region sequence
- property fr3_seq¶
Unaligned string representation of the Framework 3 region sequence
- property fr4_seq¶
Unaligned string representation of the Framework 4 region sequence
- classmethod from_fasta(path_or_handle, scheme, cdr_definition=None, as_series=False, as_generator=False, **kwargs) Union[List[Chain], Series, Generator[Chain, None, None]] ¶
Read multiple chains from FASTA
- get_position_by_raw_index(index)¶
Get Position object at corresponding raw numeric position
- graft_cdrs_onto(other: Chain, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [], name: Optional[str] = None) Chain ¶
Graft CDRs from this Chain onto another chain
- Parameters:
other – Chain to graft CDRs into (source of frameworks and tail sequence)
backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)
backmutations – List of positions that should additionally be grafted from this chain (str or or
Position
)name – Name of new Chain. If not provided, use name of this chain.
- Returns:
Chain with CDRs grafted from this chain and frameworks from the given chain
- graft_cdrs_onto_human_germline(v_gene=None, j_gene=None, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [])¶
Graft CDRs from this Chain onto the nearest human germline sequence
- Parameters:
v_gene – Use defined V germline allele (e.g. IGHV1-18*01), gene (e.g. IGHV1-18) or family (e.g. IGHV1)
j_gene – Use defined J germline allele (e.g. IGHJ1*01) or gene (e.g. IGHJ1)
backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)
backmutations – List of positions that should additionally be grafted from this chain (str or or
Position
)
- Returns:
Chain with CDRs grafted from this chain and frameworks from TODO
- is_heavy_chain()¶
Check if this chain is heavy chain (
chain_type=="H"
)
- is_kappa_light_chain()¶
Check if this chain is kappa light chain (
chain_type=="K"
)
- is_lambda_light_chain()¶
Check if this chain is lambda light chain (
chain_type=="L"
)
- is_light_chain()¶
Check if this chain is light chain (
chain_type=="K" or chain_type=="L"
)
- j_gene: str¶
J gene germline as identified by ANARCI (if assign_germline is True)
- name: str¶
User-provided sequence identifier
- print(method='wide', **kwargs)¶
Print string representation using
Chain.format()
By default, produces “wide” format with sequence on first line and CDR regions higlighted with
^
on second line:>>> chain.print() QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS ^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^
- Parameters:
method – use
"wide"
forChain.format_wide()
or"tall"
forChain.format_tall()
- print_tall(columns=5)¶
Print string representation using
Chain.format_tall()
>>> chain.print_tall() FR1 H1 Q FR1 H2 V FR1 H3 Q FR1 H4 L FR1 H5 Q FR1 H6 Q FR1 H7 S ...
- print_wide(numbering=False)¶
Print string representation using
Chain.format_wide()
>>> chain.print_wide() QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS ^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^
- property raw¶
Access raw representation of this chain to allow unaligned numeric indexing and slicing
>>> # String numbering is based on schema numbering >>> chain['1'] 'QVQLQQSGAE' >>> # Numbering of ``chain.raw`` starts at 0 >>> chain.raw[0] 'QVQLQQSGAE' >>> # Slicing with string is based on schema numbering, the end is inclusive >>> chain['1':'10'] 'QVQLQQSGAE' >>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style) >>> chain.raw[0:10] 'QVQLQQSGAE'
- Returns:
Raw chain accessor that can be sliced or indexed to produce a new
Chain
object
- property regions¶
Dictionary of region dictionaries
Region is an uppercase string, one of:
"FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"
- Returns:
Dictionary of Region name -> Dictionary of (
Position
-> Amino acid)
- renumber(scheme=None, cdr_definition=None, allowed_species=None)¶
Return copy of this chain aligned using a different numbering scheme or CDR definition
- Parameters:
scheme – Change numbering scheme: One of
imgt
,chothia
,kabat
,aho
.cdr_definition – Change CDR definition scheme: One of
imgt
,chothia
,kabat
,north
.allowed_species –
None
to allow all species, or one or more of:'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'
- scheme: str¶
Numbering scheme used to align the sequence
- property seq¶
Unaligned string representation of the variable chain sequence
- Returns:
Unaligned string representation of the variable chain sequence
- slice(replace_seq: Optional[str] = None, start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)¶
Create a slice of this chain, optionally with a replacement sequence that is placed into the same numbering
You can also slice directly using
chain['111':'112A']
orchain.raw[10:20]
.- Parameters:
replace_seq – Optional replacement sequence, needs to be the same length
start – Optional slice start position (inclusive),
Position
or string (e.g. ‘111A’)stop – Optional slice stop position (inclusive),
Position
or string (e.g. ‘112A’)stop_inclusive – Include stop position in slice
allow_raw – Allow unaligned numeric indexing from 0 to length of sequence - 1
- Returns:
new Chain object
- species: str¶
Species as identified by ANARCI
- tail: str¶
Constant region sequence
- classmethod to_dataframe(chains: List[Chain])¶
Produce a Pandas dataframe with aligned chain sequences in the columns
Note: Contains only positions (columns) that are present in the provided chains, so number of columns can differ based on the input.
- classmethod to_fasta(chains, path_or_fd, keep_tail=False, description='')¶
Save multiple chains to FASTA
- to_seq_record(keep_tail=False, description='')¶
Create BioPython SeqRecord object from this Chain
- v_gene: str¶
V gene germline as identified by ANARCI (if assign_germline is True)
Alignment¶
See the Example Jupyter Notebook for usage examples.
- class abnumber.Alignment(positions, residues, scheme, chain_type)¶
Antibody chain alignment of two or more chains
>>> from abnumber import Chain >>> >>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP' >>> chain1 = Chain(seq1, scheme='imgt') >>> >>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP' >>> chain2 = Chain(seq2, scheme='imgt') >>> alignment = chain1.align(chain2)
Alignment can be sliced and iterated:
>>> for pos, (aa, bb) in alignment[:'5']: >>> print(pos, aa, bb) H1 Q Q H2 V V H3 Q Q H4 L L H5 Q V ...
- format(mark_identity=True, mark_cdrs=True)¶
Format alignment to string
- Parameters:
mark_identity – Add BLAST style middle line showing identity (
|
), similar residue (+
) or different residue (.
)mark_cdrs – Add line highlighting CDR regions using
^
- Returns:
formatted string
- has_mutation()¶
Check if there is a mutation in the alignment or not
- num_identical()¶
Get number of positions with identical residues
- num_mutations()¶
Get number of mutations (positions with more than one type of residue)
- num_similar()¶
Get number of positions with similar residues based on BLOSUM62
- print(mark_identity=True, mark_cdrs=True)¶
Print string representation of alignment created using
Alignment.format()
>>> alignment.print() QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS ||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.||||||||||| QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS ^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^ >>> alignment.print(mark_identity=False, mark_cdrs=False) QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
- Parameters:
mark_identity – Add BLAST style middle line showing identity (
|
), similar residue (+
) or different residue (.
)mark_cdrs – Add line highlighting CDR regions using
^
- property raw¶
Access raw representation of this alignment to allow unaligned numeric indexing and slicing
>>> # Numbering of ``chain.raw`` starts at 0 >>> alignment.raw[0] 'H1' >>> # Slicing with string is based on schema numbering, the end is inclusive >>> chain['1':'10'] 'QVQLQQSGAE' >>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style) >>> chain.raw[0:10] 'QVQLQQSGAE' :return: Raw alignment accessor that can be sliced or indexed to produce a new :class:`Alignment` object
- slice(start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)¶
Create a slice of this alignment
You can also slice directly using
alignment['111':'112A']
oralignment.raw[10:20]
.- Parameters:
- Returns:
new sliced Alignment object
Position¶
See the Example Jupyter Notebook for usage examples.
- class abnumber.Position(chain_type: str, number: int, letter: str, scheme: str)¶
Numbered position using a given numbering scheme
Used as a key to store Position -> Amino acid information.
Position objects are sortable according to the schema simply using
sorted()
.- format(chain_type=True, region=False, rjust=False, ljust=False, fillchar=' ')¶
Format Position to string
- Parameters:
chain_type – Add chain type prefix (H/L)
region – Add region prefix (FR1, CDR1, …)
rjust – Align text to the right
ljust – Align text to the left
fillchar – Characer to use for alignment padding
- Returns:
formatted string
- classmethod from_string(position, chain_type, scheme)¶
Create Position object from string, e.g. “H5”
Note that Positions parsed from string do not support separate CDR definitions.
- get_region()¶
Get string name of this position’s region
- Returns:
uppercase string, one of:
"FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"
- is_in_cdr()¶
Check if given position is found in the CDR regions