Welcome to AbNumber’s documentation!

Antibody chain representation, alignment and numbering using ANARCI

Getting Started

Install AbNumber using Bioconda:

conda install -c bioconda abnumber

Credits

This tool is based on ANARCI, please cite the ANARCI paper: ANARCI: antigen receptor numbering and receptor classification

Examples

See the Example Jupyter Notebook for usage examples.

Chain

class abnumber.Chain(sequence, scheme, cdr_definition=None, name=None, assign_germline=False, allowed_species=None, **kwargs)

Antibody chain aligned to a chosen antibody numbering scheme

Example:

>>> from abnumber import Chain
>>>
>>> seq = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAPSVYPLA'
>>> chain = Chain(seq, scheme='imgt')
>>> chain
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^

Chain can be iterated:

>>> for pos, aa in chain:
>>>     print(pos, aa)
H1  Q
H2  V
H3  Q
H4  L
H5  Q
...

Chain can also be indexed and sliced using scheme numbering:

>>> chain['5']
'Q'
>>> for pos, aa in chain['H2':'H5']:
>>>     print(pos, aa)
H2  V
H3  Q
H4  L
H5  Q
Parameters:
  • sequence – Unaligned string sequence

  • name – Optional sequence identifier

  • scheme – Numbering scheme: One of imgt, chothia, kabat, aho

  • cdr_definition – Numbering scheme to be used for definition of CDR regions. Same as scheme by default. One of imgt, chothia, kabat, north. Required for aho.

  • assign_germline – Assign germline name using ANARCI based on best sequence identity

  • allowed_species – Allowed species for germline assignment. Use None to allow all species, or one or more of: 'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'

  • aa_dict – (Internal use only) Create Chain object directly from dictionary of region objects (internal use)

  • tail – (Internal use only) Constant region sequence

  • species – (Internal use only) Species as identified by ANARCI

  • germline – (Internal use only) Germline as identified by ANARCI

align(*other) Alignment

Align this chain to other chains by using their existing numbering

>>> from abnumber import Chain
>>>
>>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP'
>>> chain1 = Chain(seq1, scheme='imgt')
>>>
>>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP'
>>> chain2 = Chain(seq2, scheme='imgt')
>>>
>>> alignment = chain1.align(chain2)
>>> print(alignment.format())
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.|||||||||||
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^^                                      ^^^^^^^^^^^^
Parameters:

other – The Chain object to align, can be repeated to create a multiple sequence alignment

Returns:

Alignment object

property cdr1_seq

Unaligned string representation of the CDR 1 region sequence

property cdr2_seq

Unaligned string representation of the CDR 2 region sequence

property cdr3_seq

Unaligned string representation of the CDR 3 region sequence

cdr_definition: str

Numbering scheme to be used for definition of CDR regions (same as scheme by default)

chain_type: str

Chain type as identified by ANARCI: H (heavy), K (kappa light) or L (lambda light)

See also Chain.is_heavy_chain() and Chain.is_light_chain().

clone(replace_seq: Optional[str] = None)

Create a copy of this chain, optionally with a replacement sequence

Parameters:

replace_seq – Optional replacement sequence, needs to be the same length

Returns:

new Chain object

find_human_germlines(limit=10, v_gene=None, j_gene=None, unique=True) Tuple[List[Chain], List[Chain]]

Find most identical V and J germline sequences based on IMGT alignment

Parameters:
  • limit – Number of best matching germlines to return

  • v_gene – Filter germlines to specific V gene name

  • j_gene – Filter germlines to specific J gene name

  • unique – Skip germlines with duplicate amino acid sequence

Returns:

list of top V chains, list of top J chains

find_merged_human_germline(top=0, v_gene=None, j_gene=None) Chain

Find n-th most identical V and J germline sequence based on IMGT alignment and merge them into one Chain

Parameters:
  • top – Return top N most identical germline (0-indexed)

  • v_gene – Filter germlines to specific V gene name

  • j_gene – Filter germlines to specific J gene name

Returns:

merged germline sequence Chain object

format(method='wide', **kwargs)

Format sequence to string

Parameters:

method – use "wide" for Chain.format_wide() or "tall" for Chain.format_tall()

Returns:

formatted string

format_tall(columns=5)

Create string with one position per line, showing position numbers and amino acids

Returns:

formatted string

format_wide(numbering=False)

Create string with sequence on first line and CDR regions higlighted with ^ on second line

Parameters:

numbering – Add position numbers on top

Returns:

formatted string

property fr1_seq

Unaligned string representation of the Framework 1 region sequence

property fr2_seq

Unaligned string representation of the Framework 2 region sequence

property fr3_seq

Unaligned string representation of the Framework 3 region sequence

property fr4_seq

Unaligned string representation of the Framework 4 region sequence

classmethod from_fasta(path_or_handle, scheme, cdr_definition=None, as_series=False, as_generator=False, **kwargs) Union[List[Chain], Series, Generator[Chain, None, None]]

Read multiple chains from FASTA

get_position_by_raw_index(index)

Get Position object at corresponding raw numeric position

graft_cdrs_onto(other: Chain, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [], name: Optional[str] = None) Chain

Graft CDRs from this Chain onto another chain

Parameters:
  • other – Chain to graft CDRs into (source of frameworks and tail sequence)

  • backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)

  • backmutations – List of positions that should additionally be grafted from this chain (str or or Position)

  • name – Name of new Chain. If not provided, use name of this chain.

Returns:

Chain with CDRs grafted from this chain and frameworks from the given chain

graft_cdrs_onto_human_germline(v_gene=None, j_gene=None, backmutate_vernier=False, backmutations: List[Union[Position, str]] = [])

Graft CDRs from this Chain onto the nearest human germline sequence

Parameters:
  • v_gene – Use defined V germline allele (e.g. IGHV1-18*01), gene (e.g. IGHV1-18) or family (e.g. IGHV1)

  • j_gene – Use defined J germline allele (e.g. IGHJ1*01) or gene (e.g. IGHJ1)

  • backmutate_vernier – Also graft all Kabat Vernier positions from this chain (perform backmutations)

  • backmutations – List of positions that should additionally be grafted from this chain (str or or Position)

Returns:

Chain with CDRs grafted from this chain and frameworks from TODO

is_heavy_chain()

Check if this chain is heavy chain (chain_type=="H")

is_kappa_light_chain()

Check if this chain is kappa light chain (chain_type=="K")

is_lambda_light_chain()

Check if this chain is lambda light chain (chain_type=="L")

is_light_chain()

Check if this chain is light chain (chain_type=="K" or chain_type=="L")

j_gene: str

J gene germline as identified by ANARCI (if assign_germline is True)

name: str

User-provided sequence identifier

property positions

Dictionary of Position -> Amino acid

print(method='wide', **kwargs)

Print string representation using Chain.format()

By default, produces “wide” format with sequence on first line and CDR regions higlighted with ^ on second line:

>>> chain.print()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^
Parameters:

method – use "wide" for Chain.format_wide() or "tall" for Chain.format_tall()

print_tall(columns=5)

Print string representation using Chain.format_tall()

>>> chain.print_tall()
FR1 H1    Q
FR1 H2    V
FR1 H3    Q
FR1 H4    L
FR1 H5    Q
FR1 H6    Q
FR1 H7    S
...
print_wide(numbering=False)

Print string representation using Chain.format_wide()

>>> chain.print_wide()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^                                      ^^^^^^^^^^^^
property raw

Access raw representation of this chain to allow unaligned numeric indexing and slicing

>>> # String numbering is based on schema numbering
>>> chain['1']
'QVQLQQSGAE'
>>> # Numbering of ``chain.raw`` starts at 0
>>> chain.raw[0]
'QVQLQQSGAE'
>>> # Slicing with string is based on schema numbering, the end is inclusive
>>> chain['1':'10']
'QVQLQQSGAE'
>>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style)
>>> chain.raw[0:10]
'QVQLQQSGAE'
Returns:

Raw chain accessor that can be sliced or indexed to produce a new Chain object

property regions

Dictionary of region dictionaries

Region is an uppercase string, one of: "FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"

Returns:

Dictionary of Region name -> Dictionary of (Position -> Amino acid)

renumber(scheme=None, cdr_definition=None, allowed_species=None)

Return copy of this chain aligned using a different numbering scheme or CDR definition

Parameters:
  • scheme – Change numbering scheme: One of imgt, chothia, kabat, aho.

  • cdr_definition – Change CDR definition scheme: One of imgt, chothia, kabat, north.

  • allowed_speciesNone to allow all species, or one or more of: 'human', 'mouse','rat','rabbit','rhesus','pig','alpaca'

scheme: str

Numbering scheme used to align the sequence

property seq

Unaligned string representation of the variable chain sequence

Returns:

Unaligned string representation of the variable chain sequence

slice(replace_seq: Optional[str] = None, start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)

Create a slice of this chain, optionally with a replacement sequence that is placed into the same numbering

You can also slice directly using chain['111':'112A'] or chain.raw[10:20].

Parameters:
  • replace_seq – Optional replacement sequence, needs to be the same length

  • start – Optional slice start position (inclusive), Position or string (e.g. ‘111A’)

  • stop – Optional slice stop position (inclusive), Position or string (e.g. ‘112A’)

  • stop_inclusive – Include stop position in slice

  • allow_raw – Allow unaligned numeric indexing from 0 to length of sequence - 1

Returns:

new Chain object

species: str

Species as identified by ANARCI

tail: str

Constant region sequence

classmethod to_anarci_csv(chains: List[Chain], path)

Save multiple chains to ANARCI-like CSV

classmethod to_dataframe(chains: List[Chain])

Produce a Pandas dataframe with aligned chain sequences in the columns

Note: Contains only positions (columns) that are present in the provided chains, so number of columns can differ based on the input.

classmethod to_fasta(chains, path_or_fd, keep_tail=False, description='')

Save multiple chains to FASTA

to_seq_record(keep_tail=False, description='')

Create BioPython SeqRecord object from this Chain

v_gene: str

V gene germline as identified by ANARCI (if assign_germline is True)

Alignment

See the Example Jupyter Notebook for usage examples.

class abnumber.Alignment(positions, residues, scheme, chain_type)

Antibody chain alignment of two or more chains

>>> from abnumber import Chain
>>>
>>> seq1 = 'QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSAKTTAP'
>>> chain1 = Chain(seq1, scheme='imgt')
>>>
>>> seq2 = 'QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYDDYLDRWGQGTTLTVSSAKTTAP'
>>> chain2 = Chain(seq2, scheme='imgt')
>>> alignment = chain1.align(chain2)

Alignment can be sliced and iterated:

>>> for pos, (aa, bb) in alignment[:'5']:
>>>     print(pos, aa, bb)
H1  Q Q
H2  V V
H3  Q Q
H4  L L
H5  Q V
...
format(mark_identity=True, mark_cdrs=True)

Format alignment to string

Parameters:
  • mark_identity – Add BLAST style middle line showing identity (|), similar residue (+) or different residue (.)

  • mark_cdrs – Add line highlighting CDR regions using ^

Returns:

formatted string

has_mutation()

Check if there is a mutation in the alignment or not

num_identical()

Get number of positions with identical residues

num_mutations()

Get number of mutations (positions with more than one type of residue)

num_similar()

Get number of positions with similar residues based on BLOSUM62

print(mark_identity=True, mark_cdrs=True)

Print string representation of alignment created using Alignment.format()

>>> alignment.print()
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
||||.||||||.||||+|||||||||||.||||||||||||||||+||||||||.|.||||||||||||||||||||||||||.+|||||||||||||||||....||.|||||||||||
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
                         ^^^^^^^^                 ^^^^^^^^^                                      ^^^^^^^^^^^^
>>> alignment.print(mark_identity=False, mark_cdrs=False)
QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPS-RGYTNYNQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSS
QVQLVQSGAELDRPGATVKMSCKASGYTTTRYTMHWVKQRPGQGLDWIGYINPSDRSYTNYNQKFKDKATLTTDKSSSTAYMQKTSLTSEDSAVYYCARYYD--DYLDRWGQGTTLTVSS
Parameters:
  • mark_identity – Add BLAST style middle line showing identity (|), similar residue (+) or different residue (.)

  • mark_cdrs – Add line highlighting CDR regions using ^

property raw

Access raw representation of this alignment to allow unaligned numeric indexing and slicing

>>> # Numbering of ``chain.raw`` starts at 0
>>> alignment.raw[0]
'H1'
>>> # Slicing with string is based on schema numbering, the end is inclusive
>>> chain['1':'10']
'QVQLQQSGAE'
>>> # Slicing with ``chain.raw`` starts at 0, the end is exclusive (Python style)
>>> chain.raw[0:10]
'QVQLQQSGAE'
:return: Raw alignment accessor that can be sliced or indexed to produce a new :class:`Alignment` object
slice(start: Optional[Union[str, int, Position]] = None, stop: Optional[Union[str, int, Position]] = None, stop_inclusive: bool = True, allow_raw: bool = False)

Create a slice of this alignment

You can also slice directly using alignment['111':'112A'] or alignment.raw[10:20].

Parameters:
  • start – Slice start position (inclusive), Position or string (e.g. ‘111A’)

  • stop – Slice stop position (inclusive), Position or string (e.g. ‘112A’)

  • stop_inclusive – Include stop position in slice

  • allow_raw – Allow unaligned numeric indexing from 0 to length of sequence - 1

Returns:

new sliced Alignment object

Position

See the Example Jupyter Notebook for usage examples.

class abnumber.Position(chain_type: str, number: int, letter: str, scheme: str)

Numbered position using a given numbering scheme

Used as a key to store Position -> Amino acid information.

Position objects are sortable according to the schema simply using sorted().

format(chain_type=True, region=False, rjust=False, ljust=False, fillchar=' ')

Format Position to string

Parameters:
  • chain_type – Add chain type prefix (H/L)

  • region – Add region prefix (FR1, CDR1, …)

  • rjust – Align text to the right

  • ljust – Align text to the left

  • fillchar – Characer to use for alignment padding

Returns:

formatted string

classmethod from_string(position, chain_type, scheme)

Create Position object from string, e.g. “H5”

Note that Positions parsed from string do not support separate CDR definitions.

get_region()

Get string name of this position’s region

Returns:

uppercase string, one of: "FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"

is_in_cdr()

Check if given position is found in the CDR regions