Library reference

Python ASN.1 DER/CER/BER codec with abstract structures

This library allows you to marshal various structures in ASN.1 DER/CER format, unmarshal BER/CER/DER ones.

>>> i = Integer(123)
>>> raw = i.encode()
>>> Integer().decod(raw) == i
True

There are primitive types, holding single values (pyderasn.BitString, pyderasn.Boolean, pyderasn.Enumerated, pyderasn.GeneralizedTime, pyderasn.Integer, pyderasn.Null, pyderasn.ObjectIdentifier, pyderasn.OctetString, pyderasn.UTCTime, various strings (pyderasn.BMPString, pyderasn.GeneralString, pyderasn.GraphicString, pyderasn.IA5String, pyderasn.ISO646String, pyderasn.NumericString, pyderasn.PrintableString, pyderasn.T61String, pyderasn.TeletexString, pyderasn.UniversalString, pyderasn.UTF8String, pyderasn.VideotexString, pyderasn.VisibleString)), constructed types, holding multiple primitive types (pyderasn.Sequence, pyderasn.SequenceOf, pyderasn.Set, pyderasn.SetOf), and special types like pyderasn.Any and pyderasn.Choice.

Common for most types

Tags

Most types in ASN.1 has specific tag for them. Obj.tag_default is the default tag used during coding process. You can override it with either IMPLICIT (using either impl keyword argument or impl class attribute), or EXPLICIT one (using either expl keyword argument or expl class attribute). Both arguments take raw binary string, containing that tag. You can not set implicit and explicit tags simultaneously.

There are pyderasn.tag_ctxp() and pyderasn.tag_ctxc() functions, allowing you to easily create CONTEXT PRIMITIVE/CONSTRUCTED tags, by specifying only the required tag number.

Note

EXPLICIT tags always have constructed tag. PyDERASN does not explicitly check correctness of schema input here.

Note

Implicit tags have primitive (tag_ctxp) encoding for primitive values.

>>> Integer(impl=tag_ctxp(1))
[1] INTEGER
>>> Integer(expl=tag_ctxc(2))
[2] EXPLICIT INTEGER

Implicit tag is not explicitly shown.

Two objects of the same type, but with different implicit/explicit tags are not equal.

You can get object’s effective tag (either default or implicited) through tag property. You can decode it using pyderasn.tag_decode() function:

>>> tag_decode(tag_ctxc(123))
(128, 32, 123)
>>> klass, form, num = tag_decode(tag_ctxc(123))
>>> klass == TagClassContext
True
>>> form == TagFormConstructed
True

To determine if object has explicit tag, use expled boolean property and expl_tag property, returning explicit tag’s value.

Default/optional

Many objects in sequences could be OPTIONAL and could have DEFAULT value. You can specify that object’s property using corresponding keyword arguments.

>>> Integer(optional=True, default=123)
INTEGER 123 OPTIONAL DEFAULT

Those specifications do not play any role in primitive value encoding, but are taken into account when dealing with sequences holding them. For example TBSCertificate sequence holds defaulted, explicitly tagged version field:

class Version(Integer):
    schema = (
        ("v1", 0),
        ("v2", 1),
        ("v3", 2),
    )
class TBSCertificate(Sequence):
    schema = (
        ("version", Version(expl=tag_ctxc(0), default="v1")),
    [...]

When default argument is used and value is not specified, then it equals to default one.

Size constraints

Some objects give ability to set value size constraints. This is either possible integer value, or allowed length of various strings and sequences. Constraints are set in the following way:

class X(...):
    bounds = (MIN, MAX)

And values satisfaction is checked as: MIN <= X <= MAX.

For simplicity you can also set bounds the following way:

bounded_x = X(bounds=(MIN, MAX))

If bounds are not satisfied, then pyderasn.BoundsError is raised.

Common methods

All objects have ready boolean property, that tells if object is ready to be encoded. If that kind of action is performed on unready object, then pyderasn.ObjNotReady exception will be raised.

All objects are friendly to copy.copy() and copied objects can be safely mutated.

Also all objects can be safely pickle-d, but pay attention that pickling among different PyDERASN versions is prohibited.

Decoding

Decoding is performed using pyderasn.Obj.decode() method. offset optional argument could be used to set initial object’s offset in the binary data, for convenience. It returns decoded object and remaining unmarshalled data (tail). Internally all work is done on memoryview(data), and you can leave returning tail as a memoryview, by specifying leavemm=True argument.

Also note convenient pyderasn.Obj.decod() method, that immediately checks and raises if there is non-empty tail.

When object is decoded, decoded property is true and you can safely use following properties:

  • offset – position including initial offset where object’s tag starts

  • tlen – length of object’s tag

  • llen – length of object’s length value

  • vlen – length of object’s value

  • tlvlen – length of the whole object

Pay attention that those values do not include anything related to explicit tag. If you want to know information about it, then use:

  • expled – to know if explicit tag is set

  • expl_offset (it is lesser than offset)

  • expl_tlen,

  • expl_llen

  • expl_vlen (that actually equals to ordinary tlvlen)

  • fulloffset – it equals to expl_offset if explicit tag is set, offset otherwise

  • fulllen – it equals to expl_len if explicit tag is set, tlvlen otherwise

When error occurs, pyderasn.DecodeError is raised.

Context

You can specify so called context keyword argument during pyderasn.Obj.decode() invocation. It is dictionary containing various options governing decoding process.

Currently available context options:

Pretty printing

All objects have pps() method, that is a generator of pyderasn.PP namedtuple, holding various raw information about the object. If pps is called on sequences, then all underlying PP will be yielded.

You can use pyderasn.pp_console_row() function, converting those PP to human readable string. Actually exactly it is used for all object repr. But it is easy to write custom formatters.

>>> from pyderasn import pprint
>>> encoded = Integer(-12345).encode()
>>> obj, tail = Integer().decode(encoded)
>>> print(pprint(obj))
    0   [1,1,   2] INTEGER -12345

Example certificate:

>>> print(pprint(crt))
    0   [1,3,1604] Certificate SEQUENCE
    4   [1,3,1453]  . tbsCertificate: TBSCertificate SEQUENCE
   10-2 [1,1,   1]  . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL
   13   [1,1,   3]  . . serialNumber: CertificateSerialNumber INTEGER 61595
   18   [1,1,  13]  . . signature: AlgorithmIdentifier SEQUENCE
   20   [1,1,   9]  . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
   31   [0,0,   2]  . . . parameters: [UNIV 5] ANY OPTIONAL
                    . . . . 05:00
   33   [0,0, 278]  . . issuer: Name CHOICE rdnSequence
   33   [1,3, 274]  . . . rdnSequence: RDNSequence SEQUENCE OF
   37   [1,1,  11]  . . . . 0: RelativeDistinguishedName SET OF
   39   [1,1,   9]  . . . . . 0: AttributeTypeAndValue SEQUENCE
   41   [1,1,   3]  . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.6
   46   [0,0,   4]  . . . . . . value: [UNIV 19] AttributeValue ANY
                    . . . . . . . 13:02:45:53
[...]
 1461   [1,1,  13]  . signatureAlgorithm: AlgorithmIdentifier SEQUENCE
 1463   [1,1,   9]  . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
 1474   [0,0,   2]  . . parameters: [UNIV 5] ANY OPTIONAL
                    . . . 05:00
 1476   [1,2, 129]  . signatureValue: BIT STRING 1024 bits
                    . . 68:EE:79:97:97:DD:3B:EF:16:6A:06:F2:14:9A:6E:CD
                    . . 9E:12:F7:AA:83:10:BD:D1:7C:98:FA:C7:AE:D4:0E:2C
 [...]

Trailing data: 0a

Let’s parse that output, human:

10-2 [1,1,   1]    . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL
^  ^  ^ ^    ^     ^   ^        ^            ^       ^       ^  ^
0  1  2 3    4     5   6        7            8       9       10 11
20   [1,1,   9]    . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
^     ^ ^    ^     ^     ^          ^                 ^
0     2 3    4     5     6          9                 10
33   [0,0, 278]    . . issuer: Name CHOICE rdnSequence
^     ^ ^    ^     ^   ^       ^    ^      ^
0     2 3    4     5   6       8    9      10
52-2∞ B [1,1,1054]∞  . . . . eContent: [0] EXPLICIT BER OCTET STRING 1046 bytes
      ^           ^                                 ^   ^            ^
     12          13                                14   9            10
0:

Offset of the object, where its DER/BER encoding begins. Pay attention that it does not include explicit tag.

1:

If explicit tag exists, then this is its length (tag + encoded length).

2:

Length of object’s tag. For example CHOICE does not have its own tag, so it is zero.

3:

Length of encoded length.

4:

Length of encoded value.

5:

Visual indentation to show the depth of object in the hierarchy.

6:

Object’s name inside SEQUENCE/CHOICE.

7:

If either IMPLICIT or EXPLICIT tag is set, then it will be shown here. “IMPLICIT” is omitted.

8:

Object’s class name, if set. Omitted if it is just an ordinary simple value (like with algorithm in example above).

9:

Object’s ASN.1 type.

10:

Object’s value, if set. Can consist of multiple words (like OCTET/BIT STRINGs above). We see v3 value in Version, because it is named. rdnSequence is the choice of CHOICE type.

11:

Possible other flags like OPTIONAL and DEFAULT, if value equals to the default one, specified in the schema.

12:

Shows does object contains any kind of BER encoded data (possibly Sequence holding BER-encoded underlying value).

13:

Only applicable to BER encoded data. Indefinite length encoding mark.

14:

Only applicable to BER encoded data. If object has BER-specific encoding, then BER will be shown. It does not depend on indefinite length encoding. EOC, BOOLEAN, BIT STRING, OCTET STRING (and its derivatives), SET, SET OF, UTCTime, GeneralizedTime could be BERed.

Also it could be helpful to add quick ASN.1 pprinting command in your pdb’s configuration file:

alias pp1 import pyderasn ;; print(pyderasn.pprint(%1, oid_maps=(locals().get("OID_STR_TO_NAME", {}),)))

DEFINED BY

ASN.1 structures often have ANY and OCTET STRING fields, that are DEFINED BY some previously met ObjectIdentifier. This library provides ability to specify mapping between some OID and field that must be decoded with specific specification.

defines kwarg

pyderasn.ObjectIdentifier field inside pyderasn.Sequence can hold mapping between OIDs and necessary for decoding structures. For example, CMS (RFC 5652) container:

class ContentInfo(Sequence):
    schema = (
        ("contentType", ContentType(defines=((("content",), {
            id_digestedData: DigestedData(),
            id_signedData: SignedData(),
        }),))),
        ("content", Any(expl=tag_ctxc(0))),
    )

contentType field tells that it defines that content must be decoded with SignedData specification, if contentType equals to id-signedData. The same applies to DigestedData. If contentType contains unknown OID, then no automatic decoding is done.

You can specify multiple fields, that will be autodecoded – that is why defines kwarg is a sequence. You can specify defined field relatively or absolutely to current decode path. For example defines for AlgorithmIdentifier of X.509’s tbsCertificate:subjectPublicKeyInfo:algorithm:algorithm:

(
    (("parameters",), {
        id_ecPublicKey: ECParameters(),
        id_GostR3410_2001: GostR34102001PublicKeyParameters(),
    }),
    (("..", "subjectPublicKey"), {
        id_rsaEncryption: RSAPublicKey(),
        id_GostR3410_2001: OctetString(),
    }),
),

tells that if certificate’s SPKI algorithm is GOST R 34.10-2001, then autodecode its parameters inside SPKI’s algorithm and its public key itself.

Following types can be automatically decoded (DEFINED BY):

When any of those fields is automatically decoded, then .defined attribute contains (OID, value) tuple. OID tells by which OID it was defined, value contains corresponding decoded value. For example above, content_info["content"].defined == (id_signedData, signed_data).

defines_by_path context option

Sometimes you either can not or do not want to explicitly set defines in the schema. You can dynamically apply those definitions when calling pyderasn.Obj.decode() method.

Specify defines_by_path key in the decode context. Its value must be sequence of following tuples:

(decode_path, defines)

where decode_path is a tuple holding so-called decode path to the exact pyderasn.ObjectIdentifier field you want to apply defines, holding exactly the same value as accepted in its keyword argument.

For example, again for CMS, you want to automatically decode SignedData and CMC’s (RFC 5272) PKIData and PKIResponse structures it may hold. Also, automatically decode controlSequence of PKIResponse:

content_info = ContentInfo().decod(data, ctx={"defines_by_path": (
    (
        ("contentType",),
        ((("content",), {id_signedData: SignedData()}),),
    ),
    (
        (
            "content",
            DecodePathDefBy(id_signedData),
            "encapContentInfo",
            "eContentType",
        ),
        ((("eContent",), {
            id_cct_PKIData: PKIData(),
            id_cct_PKIResponse: PKIResponse(),
        })),
    ),
    (
        (
            "content",
            DecodePathDefBy(id_signedData),
            "encapContentInfo",
            "eContent",
            DecodePathDefBy(id_cct_PKIResponse),
            "controlSequence",
            any,
            "attrType",
        ),
        ((("attrValues",), {
            id_cmc_recipientNonce: RecipientNonce(),
            id_cmc_senderNonce: SenderNonce(),
            id_cmc_statusInfoV2: CMCStatusInfoV2(),
            id_cmc_transactionId: TransactionId(),
        })),
    ),
)})

Pay attention for pyderasn.DecodePathDefBy and any. First function is useful for path construction when some automatic decoding is already done. any means literally any value it meet – useful for SEQUENCE/SET OF-s.

BER encoding

By default PyDERASN accepts only DER encoded data. By default it encodes to DER. But you can optionally enable BER decoding with setting bered context argument to True. Indefinite lengths and constructed primitive types should be parsed successfully.

  • If object is encoded in BER form (not the DER one), then ber_encoded attribute is set to True. Only BOOLEAN, BIT STRING, OCTET STRING, OBJECT IDENTIFIER, SEQUENCE, SET, SET OF, UTCTime, GeneralizedTime can contain it.

  • If object has an indefinite length encoding, then its lenindef attribute is set to True. Only BIT STRING, OCTET STRING, SEQUENCE, SET, SEQUENCE OF, SET OF, ANY can contain it.

  • If object has an indefinite length encoded explicit tag, then expl_lenindef is set to True.

  • If object has either any of BER-related encoding (explicit tag indefinite length, object’s indefinite length, BER-encoding) or any underlying component has that kind of encoding, then bered attribute is set to True. For example SignedData CMS can have ContentInfo:content:signerInfos:* bered value set to True, but ContentInfo:content:signerInfos:*:signedAttrs won’t.

EOC (end-of-contents) token’s length is taken in advance in object’s value length.

Allow explicit tag out-of-bound

Invalid BER encoding could contain EXPLICIT tag containing more than one value, more than one object. If you set allow_expl_oob context option to True, then no error will be raised and that invalid encoding will be silently further processed. But pay attention that offsets and lengths will be invalid in that case.

Warning

This option should be used only for skipping some decode errors, just to see the decoded structure somehow.

Streaming and dealing with huge structures

evgen mode

ASN.1 structures can be huge, they can hold millions of objects inside (for example Certificate Revocation Lists (CRL), holding revocation state for every previously issued X.509 certificate). CACert.org’s 8 MiB CRL file takes more than half a gigabyte of memory to hold the decoded structure.

If you just simply want to check the signature over the tbsCertList, you can create specialized schema with that field represented as OctetString for example:

class TBSCertListFast(Sequence):
    schema = (
        [...]
        ("revokedCertificates", OctetString(
            impl=SequenceOf.tag_default,
            optional=True,
        )),
        [...]
    )

This allows you to quickly decode a few fields and check the signature over the tbsCertList bytes.

But how can you get all certificate’s serial number from it, after you trust that CRL after signature validation? You can use so called evgen (event generation) mode, to catch the events/facts of some successful object decoding. Let’s use command line capabilities:

$ python -m pyderasn --schema tests.test_crl:CertificateList --evgen revoke.crl
     10     [1,1,   1]   . . version: Version INTEGER v2 (01) OPTIONAL
     15     [1,1,   9]   . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.13
     26     [0,0,   2]   . . . parameters: [UNIV 5] ANY OPTIONAL
     13     [1,1,  13]   . . signature: AlgorithmIdentifier SEQUENCE
     34     [1,1,   3]   . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.10
     39     [0,0,   9]   . . . . . . value: [UNIV 19] AttributeValue ANY
     32     [1,1,  14]   . . . . . 0: AttributeTypeAndValue SEQUENCE
     30     [1,1,  16]   . . . . 0: RelativeDistinguishedName SET OF
[...]
    188     [1,1,   1]   . . . . userCertificate: CertificateSerialNumber INTEGER 17 (11)
    191     [1,1,  13]   . . . . . utcTime: UTCTime UTCTime 2003-04-01T14:25:08
    191     [0,0,  15]   . . . . revocationDate: Time CHOICE utcTime
    191     [1,1,  13]   . . . . . utcTime: UTCTime UTCTime 2003-04-01T14:25:08
    186     [1,1,  18]   . . . 0: RevokedCertificate SEQUENCE
    208     [1,1,   1]   . . . . userCertificate: CertificateSerialNumber INTEGER 20 (14)
    211     [1,1,  13]   . . . . . utcTime: UTCTime UTCTime 2002-10-01T02:18:01
    211     [0,0,  15]   . . . . revocationDate: Time CHOICE utcTime
    211     [1,1,  13]   . . . . . utcTime: UTCTime UTCTime 2002-10-01T02:18:01
    206     [1,1,  18]   . . . 1: RevokedCertificate SEQUENCE
[...]
9144992     [0,0,  15]   . . . . revocationDate: Time CHOICE utcTime
9144992     [1,1,  13]   . . . . . utcTime: UTCTime UTCTime 2020-02-08T07:25:06
9144985     [1,1,  20]   . . . 415755: RevokedCertificate SEQUENCE
  181     [1,4,9144821]   . . revokedCertificates: RevokedCertificates SEQUENCE OF OPTIONAL
    5     [1,4,9144997]   . tbsCertList: TBSCertList SEQUENCE
9145009     [1,1,   9]   . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.13
9145020     [0,0,   2]   . . parameters: [UNIV 5] ANY OPTIONAL
9145007     [1,1,  13]   . signatureAlgorithm: AlgorithmIdentifier SEQUENCE
9145022     [1,3, 513]   . signatureValue: BIT STRING 4096 bits
    0     [1,4,9145534]  CertificateList SEQUENCE

Here we see how decoder works: it decodes SEQUENCE’s tag, length, then decodes underlying values. It can not tell if SEQUENCE is decoded, so the event of the upper level SEQUENCE is the last one we see. version field is just a single INTEGER – it is decoded and event is fired immediately. Then we see that algorithm and parameters fields are decoded and only after them the signature SEQUENCE is fired as a successfully decoded. There are 4 events for each revoked certificate entry in that CRL: userCertificate serial number, utcTime of revocationDate CHOICE, RevokedCertificate itself as a one of entity in revokedCertificates SEQUENCE OF.

We can do that in our ordinary Python code and understand where we are by looking at deterministically generated decode paths (do not forget about useful --print-decode-path CLI option). We must use pyderasn.Obj.decode_evgen() method, instead of ordinary pyderasn.Obj.decode(). It is generator yielding (decode_path, obj, tail) tuples:

for decode_path, obj, _ in CertificateList().decode_evgen(crl_raw):
    if (
        len(decode_path) == 4 and
        decode_path[:2] == ("tbsCertList", "revokedCertificates"),
        decode_path[3] == "userCertificate"
    ):
        print("serial number:", int(obj))

Virtually it does not take any memory except at least needed for single object storage. You can easily use that mode to determine required object .offset and .*len to be able to decode it separately, or maybe verify signature upon it just by taking bytes by .offset and .tlvlen.

evgen_mode_upto

There is full ability to get any kind of data from the CRL in the example above. However it is not too convenient to get the whole RevokedCertificate structure, that is pretty lightweight and one may do not want to disassemble it. You can use evgen_mode_upto ctx option that semantically equals to defines_by_path – list of decode paths mapped to any non-None value. If specified decode path is met, then any subsequent objects won’t be decoded in evgen mode. That allows us to parse the CRL above with fully assembled RevokedCertificate:

for decode_path, obj, _ in CertificateList().decode_evgen(
    crl_raw,
    ctx={"evgen_mode_upto": (
        (("tbsCertList", "revokedCertificates", any), True),
    )},
):
    if (
        len(decode_path) == 3 and
        decode_path[:2] == ("tbsCertList", "revokedCertificates"),
    ):
        print("serial number:", int(obj["userCertificate"]))

Note

SEQUENCE/SET values with DEFAULT specified are automatically decoded without evgen mode.

mmap-ed file

POSIX compliant systems have mmap syscall, giving ability to work the memory mapped file. You can deal with the file like it was an ordinary binary string, allowing you not to load it to the memory first. Also you can use them as an input for OCTET STRING, taking no Python memory for their storage.

There is convenient pyderasn.file_mmaped() function that creates read-only memoryview on the file contents:

with open("huge", "rb") as fd:
    raw = file_mmaped(fd)
    obj = Something.decode(raw)

Warning

mmap maps the whole file. So it plays no role if you seek-ed it before. Take the slice of the resulting memoryview with required offset instead.

Note

If you use ZFS as underlying storage, then pay attention that currently most platforms does not deal good with ZFS ARC and ordinary page cache used for mmaps. It can take twice the necessary size in the memory: both in page cache and ZFS ARC.

That read-only memoryview could be safe to be used as a value inside decoded pyderasn.OctetString and pyderasn.Any objects. You can enable that by setting “keep_memoryview”: True in decode context. No OCTET STRING and ANY values will be copied to memory. Of course that works only in DER encoding, where the value is continuously encoded.

CER encoding

We can parse any kind of data now, but how can we produce files streamingly, without storing their encoded representation in memory? SEQUENCE by default encodes in memory all its values, joins them in huge binary string, just to know the exact size of SEQUENCE’s value for encoding it in TLV. DER requires you to know all exact sizes of the objects.

You can use CER encoding mode, that slightly differs from the DER, but does not require exact sizes knowledge, allowing streaming encoding directly to some writer/buffer. Just use pyderasn.Obj.encode_cer() method, providing the writer where encoded data will flow:

with open("result", "wb") as fd:
    obj.encode_cer(fd.write)
buf = io.BytesIO()
obj.encode_cer(buf.write)

If you do not want to create in-memory buffer every time, then you can use pyderasn.encode_cer() function:

data = encode_cer(obj)

Remember that CER is not valid DER in most cases, so you have to use bered ctx option during its decoding. Also currently there is no validation that provided CER is valid one – you are sure that it has only valid BER encoding.

Warning

SET OF values can not be streamingly encoded, because they are required to be sorted byte-by-byte. Big SET OF values still will take much memory. Use neither SET nor SET OF values, as modern ASN.1 also recommends too.

Do not forget about using mmap-ed memoryviews for your OCTET STRINGs! They will be streamingly copied from underlying file to the buffer using 1 KB chunks.

Some structures require that some of the elements have to be forcefully DER encoded. For example SignedData CMS requires you to encode SignedAttributes and X.509 certificates in DER form, allowing you to encode everything else in BER. You can tell any of the structures to be forcefully encoded in DER during CER encoding, by specifying der_forced=True attribute:

class Certificate(Sequence):
    schema = (...)
    der_forced = True

class SignedAttributes(SetOf):
    schema = Attribute()
    bounds = (1, float("+inf"))
    der_forced = True

agg_octet_string

In most cases, huge quantity of binary data is stored as OCTET STRING. CER encoding splits it on 1 KB chunks. BER allows splitting on various levels of chunks inclusion:

SOME STRING[CONSTRUCTED]
    OCTET STRING[CONSTRUCTED]
        OCTET STRING[PRIMITIVE]
            DATA CHUNK
        OCTET STRING[PRIMITIVE]
            DATA CHUNK
        OCTET STRING[PRIMITIVE]
            DATA CHUNK
    OCTET STRING[PRIMITIVE]
        DATA CHUNK
    OCTET STRING[CONSTRUCTED]
        OCTET STRING[PRIMITIVE]
            DATA CHUNK
        OCTET STRING[PRIMITIVE]
            DATA CHUNK
    OCTET STRING[CONSTRUCTED]
        OCTET STRING[CONSTRUCTED]
            OCTET STRING[PRIMITIVE]
                DATA CHUNK

You can not just take the offset and some .vlen of the STRING and treat it as the payload. If you decode it without evgen mode, then it will be automatically aggregated and bytes() will give the whole payload contents.

You are forced to use evgen mode for decoding for small memory footprint. There is convenient pyderasn.agg_octet_string() helper for reconstructing the payload. Let’s assume you have got BER/CER encoded ContentInfo with huge SignedData and EncapsulatedContentInfo. Let’s calculate the SHA512 digest of its eContent:

fd = open("data.p7m", "rb")
raw = file_mmaped(fd)
ctx = {"bered": True}
for decode_path, obj, _ in ContentInfo().decode_evgen(raw, ctx=ctx):
    if decode_path == ("content",):
        content = obj
        break
else:
    raise ValueError("no content found")
hasher_state = sha512()
def hasher(data):
    hasher_state.update(data)
    return len(data)
evgens = SignedData().decode_evgen(
    raw[content.offset:],
    offset=content.offset,
    ctx=ctx,
)
agg_octet_string(evgens, ("encapContentInfo", "eContent"), raw, hasher)
fd.close()
digest = hasher_state.digest()

Simply replace hasher with some writeable file’s fd.write to copy the payload (without BER/CER encoding interleaved overhead) in it. Virtually it won’t take memory more than for keeping small structures and 1 KB binary chunks.

SEQUENCE OF iterators

You can use iterators as a value in pyderasn.SequenceOf classes. The only difference with providing the full list of objects, is that type and bounds checking is done during encoding process. Also sequence’s value will be emptied after encoding, forcing you to set its value again.

This is very useful when you have to create some huge objects, like CRLs, with thousands and millions of entities inside. You can write the generator taking necessary data from the database and giving the RevokedCertificate objects. Only binary representation of that objects will take memory during DER encoding.

2-pass DER encoding

There is ability to do 2-pass encoding to DER, writing results directly to specified writer (buffer, file, whatever). It could be 1.5+ times slower than ordinary encoding, but it takes little memory for 1st pass state storing. For example, 1st pass state for CACert.org’s CRL with ~416K of certificate entries takes nearly 3.5 MB of memory. SignedData with several gigabyte EncapsulatedContentInfo takes nearly 0.5 KB of memory.

If you use mmap-ed memoryviews, SEQUENCE OF iterators and write directly to opened file, then there is very small memory footprint.

1st pass traverses through all the objects of the structure and returns the size of DER encoded structure, together with 1st pass state object. That state contains precalculated lengths for various objects inside the structure.

fulllen, state = obj.encode1st()

2nd pass takes the writer and 1st pass state. It traverses through all the objects again, but writes their encoded representation to the writer.

with open("result", "wb") as fd:
    obj.encode2nd(fd.write, iter(state))

Warning

You MUST NOT use 1st pass state if anything is changed in the objects. It is intended to be used immediately after 1st pass is done!

If you use SEQUENCE OF iterators, then you have to reinitialize the values after the 1st pass. And you have to be sure that the iterator gives exactly the same values as previously. Yes, you have to run your iterator twice – because this is two pass encoding mode.

If you want to encode to the memory, then you can use convenient pyderasn.encode2pass() helper.

ASN.1 browser

pyderasn.browse(raw, obj, oid_maps=())

Interactive browser

Parameters:
  • raw (bytes) – binary data you decoded

  • obj – decoded pyderasn.Obj

  • oid_maps – list of str(OID) <-> human readable string dictionaries. Its human readable form is printed when OID is met

Note

urwid dependency required

This browser is an interactive terminal application for browsing structures of your decoded ASN.1 objects. You can quit it with q key. It consists of three windows:

Tree:

View of ASN.1 elements hierarchy. You can navigate it using Up, Down, PageUp, PageDown, Home, End keys. Left key goes to constructed element above. Plus/Minus keys collapse/uncollapse constructed elements. Space toggles it

Info:

window with various information about element. You can scroll it with h/l (down, up) (H/L for triple speed) keys

Hexdump:

window with raw data hexdump and highlighted current element’s contents. It automatically focuses on element’s data. You can scroll it with j/k (down, up) (J/K for triple speed) keys. If element has explicit tag, then it also will be highlighted with different colour

Window’s header contains current decode path and progress bars with position in info and hexdump windows.

If you press d, then current element will be saved in the current directory under its decode path name (adding “.0”, “.1”, etc suffix if such file already exists). D will save it with explicit tag.

You can also invoke it with --browse command line argument.

Base Obj

class pyderasn.Obj(impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

Common ASN.1 object class

All ASN.1 types are inherited from it. It has metaclass that automatically adds __slots__ to all inherited classes.

property bered

Is either object or any elements inside is BER encoded?

decod(data, offset=0, decode_path=(), ctx=None)

Decode the data, check that tail is empty

Raises:

ExceedingData – if tail is not empty

This is just a wrapper over pyderasn.Obj.decode() (decode without tail) that also checks that there is no trailing data left.

decode(data, offset=0, leavemm=False, decode_path=(), ctx=None, tag_only=False, _ctx_immutable=True)

Decode the data

Parameters:
  • data – either binary or memoryview

  • offset (int) – initial data’s offset

  • leavemm (bool) – do we need to leave memoryview of remaining data as is, or convert it to bytes otherwise

  • decode_path – current decode path (tuples of strings, possibly with DecodePathDefBy) with will be the root for all underlying objects

  • ctx – optional context governing decoding process

  • tag_only (bool) – decode only the tag, without length and contents (used only in Choice and Set structures, trying to determine if tag satisfies the schema)

  • _ctx_immutable (bool) – do we need to copy.copy() ctx before using it?

Returns:

(Obj, remaining data)

See also

Decoding

decode_evgen(data, offset=0, leavemm=False, decode_path=(), ctx=None, tag_only=False, _ctx_immutable=True, _evgen_mode=True)

Decode with evgen mode on

That method is identical to pyderasn.Obj.decode(), but it returns the generator producing (decode_path, obj, tail) values. .. seealso:: evgen mode.

property decoded

Is object decoded?

encode()

DER encode the structure

Returns:

DER representation

encode1st(state=None)

Do the 1st pass of 2-pass encoding

Return type:

(int, array(“L”))

Returns:

full length of encoded data and precalculated various objects lengths

encode2nd(writer, state_iter)

Do the 2nd pass of 2-pass encoding

Parameters:
  • writer – must comply with io.RawIOBase.write behaviour

  • state_iter – iterator over the 1st pass state (iter(state))

encode_cer(writer)

CER encode the structure to specified writer

Parameters:

writer – must comply with io.RawIOBase.write behaviour. It takes slice to be written and returns number of bytes processed. If it returns None, then exception will be raised

property expl_llen

See also

Decoding

property expl_offset

See also

Decoding

property expl_tag

See also

Decoding

property expl_tlen

See also

Decoding

property expl_tlvlen

See also

Decoding

property expl_vlen

See also

Decoding

property expled

See also

Decoding

property fulllen

See also

Decoding

property fulloffset

See also

Decoding

hexdecod(data, *args, **kwargs)

Do pyderasn.Obj.decod() with hexadecimal decoded data

hexdecode(data, *args, **kwargs)

Do pyderasn.Obj.decode() with hexadecimal decoded data

hexencode()

Do hexadecimal encoded pyderasn.Obj.encode()

property ready

Is object ready to be encoded?

property tag_order

Tag’s (class, number) used for DER/CER sorting

property tlen

See also

Decoding

property tlvlen

See also

Decoding

Primitive types

Boolean

class pyderasn.Boolean(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

BOOLEAN boolean type

>>> b = Boolean(True)
BOOLEAN True
>>> b == Boolean(True)
True
>>> bool(b)
True
__init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either boolean type, or pyderasn.Boolean object

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

Integer

class pyderasn.Integer(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))

INTEGER integer type

>>> b = Integer(-123)
INTEGER -123
>>> b == Integer(-123)
True
>>> int(b)
-123
>>> Integer(2, bounds=(1, 3))
INTEGER 2
>>> Integer(5, bounds=(1, 3))
Traceback (most recent call last):
pyderasn.BoundsError: unsatisfied bounds: 1 <= 5 <= 3
class Version(Integer):
    schema = (
        ("v1", 0),
        ("v2", 1),
        ("v3", 2),
    )
>>> v = Version("v1")
Version INTEGER v1
>>> int(v)
0
>>> v.named
'v1'
>>> v.specs
{'v3': 2, 'v1': 0, 'v2': 1}
__init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either integer type, named value (if schema is specified in the class), or pyderasn.Integer object

  • bounds – set (MIN, MAX) value constraint. (-inf, +inf) by default

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

property named

Return named representation (if exists) of the value

tohex()

Hexadecimal representation

Use pyderasn.colonize_hex() for colonizing it.

BitString

class pyderasn.BitString(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))

BIT STRING bit string type

>>> BitString(b"hello world")
BIT STRING 88 bits 68656c6c6f20776f726c64
>>> bytes(b)
b'hello world'
>>> b == b"hello world"
True
>>> b.bit_len
88
>>> BitString("'0A3B5F291CD'H")
BIT STRING 44 bits 0a3b5f291cd0
>>> b = BitString("'010110000000'B")
BIT STRING 12 bits 5800
>>> b.bit_len
12
>>> b[0], b[1], b[2], b[3]
(False, True, False, True)
>>> b[1000]
False
>>> [v for v in b]
[False, True, False, True, True, False, False, False, False, False, False, False]
class KeyUsage(BitString):
    schema = (
        ("digitalSignature", 0),
        ("nonRepudiation", 1),
        ("keyEncipherment", 2),
    )
>>> b = KeyUsage(("keyEncipherment", "nonRepudiation"))
KeyUsage BIT STRING 3 bits nonRepudiation, keyEncipherment
>>> b.named
['nonRepudiation', 'keyEncipherment']
>>> b.specs
{'nonRepudiation': 1, 'digitalSignature': 0, 'keyEncipherment': 2}

Note

Pay attention that BIT STRING can be encoded both in primitive and constructed forms. Decoder always checks constructed form tag additionally to specified primitive one. If BER decoding is not enabled, then decoder will fail, because of DER restrictions.

__init__(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either binary type, tuple of named values (if schema is specified in the class), string in 'XXX...'B form, or pyderasn.BitString object

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

property bit_len

Returns number of bits in the string

property named

Named representation (if exists) of the bits

Returns:

[str(name), …]

OctetString

class pyderasn.OctetString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)

OCTET STRING binary string type

>>> s = OctetString(b"hello world")
OCTET STRING 11 bytes 68656c6c6f20776f726c64
>>> s == OctetString(b"hello world")
True
>>> bytes(s)
b'hello world'
>>> OctetString(b"hello", bounds=(4, 4))
Traceback (most recent call last):
pyderasn.BoundsError: unsatisfied bounds: 4 <= 5 <= 4
>>> OctetString(b"hell", bounds=(4, 4))
OCTET STRING 4 bytes 68656c6c

Memoryviews can be used as a values. If memoryview is made on mmap-ed file, then it does not take storage inside OctetString itself. In CER encoding mode it will be streamed to the specified writer, copying 1 KB chunks.

__init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)
Parameters:
  • value – set the value. Either binary type, or pyderasn.OctetString object

  • bounds – set (MIN, MAX) value size constraint. (-inf, +inf) by default

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

Null

class pyderasn.Null(value=None, impl=None, expl=None, optional=False, _decoded=(0, 0, 0))

NULL null object

>>> n = Null()
NULL
>>> n.ready
True
__init__(value=None, impl=None, expl=None, optional=False, _decoded=(0, 0, 0))
Parameters:
  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • optional (bool) – is object OPTIONAL in sequence

ObjectIdentifier

class pyderasn.ObjectIdentifier(value=None, defines=(), impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

OBJECT IDENTIFIER OID type

>>> oid = ObjectIdentifier((1, 2, 3))
OBJECT IDENTIFIER 1.2.3
>>> oid == ObjectIdentifier("1.2.3")
True
>>> tuple(oid)
(1, 2, 3)
>>> str(oid)
'1.2.3'
>>> oid + (4, 5) + ObjectIdentifier("1.7")
OBJECT IDENTIFIER 1.2.3.4.5.1.7
>>> str(ObjectIdentifier((3, 1)))
Traceback (most recent call last):
pyderasn.InvalidOID: unacceptable first arc value
__init__(value=None, defines=(), impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either tuples of integers, string of “.”-concatenated integers, or pyderasn.ObjectIdentifier object

  • defines

    sequence of tuples. Each tuple has two elements. First one is relative to current one decode path, aiming to the field defined by that OID. Read about relative path in pyderasn.abs_decode_path(). Second tuple element is {OID: pyderasn.Obj()} dictionary, mapping between current OID value and structure applied to defined field.

    See also

    DEFINED BY

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

Enumerated

class pyderasn.Enumerated(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0), bounds=None)

ENUMERATED integer type

This type is identical to pyderasn.Integer, but requires schema to be specified and does not accept values missing from it.

CommonString

class pyderasn.CommonString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)

Common class for all strings

Everything resembles pyderasn.OctetString, except ability to deal with unicode text strings.

>>> hexenc("привет мир".encode("utf-8"))
'd0bfd180d0b8d0b2d0b5d18220d0bcd0b8d180'
>>> UTF8String("привет мир") == UTF8String(hexdec("d0...80"))
True
>>> s = UTF8String("привет мир")
UTF8String UTF8String привет мир
>>> str(s)
'привет мир'
>>> hexenc(bytes(s))
'd0bfd180d0b8d0b2d0b5d18220d0bcd0b8d180'
>>> PrintableString("привет мир")
Traceback (most recent call last):
pyderasn.DecodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
>>> BMPString("ада", bounds=(2, 2))
Traceback (most recent call last):
pyderasn.BoundsError: unsatisfied bounds: 2 <= 3 <= 2
>>> s = BMPString("ад", bounds=(2, 2))
>>> s.encoding
'utf-16-be'
>>> hexenc(bytes(s))
'04300434'

Class

Text Encoding, validation

pyderasn.UTF8String

utf-8

pyderasn.NumericString

proper alphabet validation

pyderasn.PrintableString

proper alphabet validation

pyderasn.TeletexString

iso-8859-1

pyderasn.T61String

iso-8859-1

pyderasn.VideotexString

iso-8859-1

pyderasn.IA5String

proper alphabet validation

pyderasn.GraphicString

iso-8859-1

pyderasn.VisibleString, pyderasn.ISO646String

proper alphabet validation

pyderasn.GeneralString

iso-8859-1

pyderasn.UniversalString

utf-32-be

pyderasn.BMPString

utf-16-be

NumericString

class pyderasn.NumericString(*args, **kwargs)

Numeric string

Its value is properly sanitized: only ASCII digits with spaces can be stored.

>>> NumericString().allowable_chars
frozenset(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ' '])

PrintableString

class pyderasn.PrintableString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None, allow_asterisk=False, allow_ampersand=False)

Printable string

Its value is properly sanitized: see X.680 41.4 table 10.

>>> PrintableString().allowable_chars
frozenset([' ', "'", ..., 'z'])
>>> obj = PrintableString("foo*bar", allow_asterisk=True)
PrintableString PrintableString foo*bar
>>> obj.allow_asterisk, obj.allow_ampersand
(True, False)
__init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None, allow_asterisk=False, allow_ampersand=False)
Parameters:
  • allow_asterisk – allow asterisk character

  • allow_ampersand – allow ampersand character

property allow_ampersand

Is ampersand character allowed?

property allow_asterisk

Is asterisk character allowed?

IA5String

class pyderasn.IA5String(*args, **kwargs)

IA5 string

Its value is properly sanitized: it is a mix of

It is just 7-bit ASCII.

>>> IA5String().allowable_chars
frozenset(["NUL", ... "DEL"])

VisibleString

class pyderasn.VisibleString(*args, **kwargs)

Visible string

Its value is properly sanitized. ASCII subset from space to tilde is allowed: http://www.itscj.ipsj.or.jp/iso-ir/006.pdf

>>> VisibleString().allowable_chars
frozenset([" ", ... "~"])

UTCTime

class pyderasn.UTCTime(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)

UTCTime datetime type

>>> t = UTCTime(datetime(2017, 9, 30, 22, 7, 50, 123))
UTCTime UTCTime 2017-09-30T22:07:50
>>> str(t)
'170930220750Z'
>>> bytes(t)
b'170930220750Z'
>>> t.todatetime()
datetime.datetime(2017, 9, 30, 22, 7, 50)
>>> UTCTime(datetime(2057, 9, 30, 22, 7, 50)).todatetime()
datetime.datetime(1957, 9, 30, 22, 7, 50)
>>> UTCTime(datetime(2057, 9, 30, 22, 7, 50)).totzdatetime()
datetime.datetime(1957, 9, 30, 22, 7, 50, tzinfo=tzutc())

If BER encoded value was met, then ber_raw attribute will hold its raw representation.

Warning

Only naive datetime objects are supported. Library assumes that all work is done in UTC.

Warning

Pay attention that UTCTime can not hold full year, so all years having < 50 years are treated as 20xx, 19xx otherwise, according to X.509 recommendation. Use GeneralizedTime instead for removing ambiguity.

Warning

No strict validation of UTC offsets are made (only applicable to BER), but very crude:

  • minutes are not exceeding 60

  • offset value is not exceeding 14 hours

__init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)
Parameters:
  • value – set the value. Either datetime type, or pyderasn.UTCTime object

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

GeneralizedTime

class pyderasn.GeneralizedTime(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)

GeneralizedTime datetime type

This type is similar to pyderasn.UTCTime.

>>> t = GeneralizedTime(datetime(2017, 9, 30, 22, 7, 50, 123))
GeneralizedTime GeneralizedTime 2017-09-30T22:07:50.000123
>>> str(t)
'20170930220750.000123Z'
>>> t = GeneralizedTime(datetime(2057, 9, 30, 22, 7, 50))
GeneralizedTime GeneralizedTime 2057-09-30T22:07:50

Warning

Only naive datetime objects are supported. Library assumes that all work is done in UTC.

Warning

Only microsecond fractions are supported in DER encoding. pyderasn.DecodeError will be raised during decoding of higher precision values.

Warning

BER encoded data can loss information (accuracy) during decoding because of float transformations.

Warning

Zero year is unsupported.

__init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)
Parameters:
  • value – set the value. Either datetime type, or pyderasn.UTCTime object

  • impl (bytes) – override default tag with IMPLICIT one

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

Special types

Choice

class pyderasn.Choice(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

CHOICE special type

class GeneralName(Choice):
    schema = (
        ("rfc822Name", IA5String(impl=tag_ctxp(1))),
        ("dNSName", IA5String(impl=tag_ctxp(2))),
    )
>>> gn = GeneralName()
GeneralName CHOICE
>>> gn["rfc822Name"] = IA5String("foo@bar.baz")
GeneralName CHOICE rfc822Name[[1] IA5String IA5 foo@bar.baz]
>>> gn["dNSName"] = IA5String("bar.baz")
GeneralName CHOICE dNSName[[2] IA5String IA5 bar.baz]
>>> gn["rfc822Name"]
None
>>> gn["dNSName"]
[2] IA5String IA5 bar.baz
>>> gn.choice
'dNSName'
>>> gn.value == gn["dNSName"]
True
>>> gn.specs
OrderedDict([('rfc822Name', [1] IA5String IA5), ('dNSName', [2] IA5String IA5)])
>>> GeneralName(("rfc822Name", IA5String("foo@bar.baz")))
GeneralName CHOICE rfc822Name[[1] IA5String IA5 foo@bar.baz]
__init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either (choice, value) tuple, or pyderasn.Choice object

  • impl (bytes) – can not be set, do not use it

  • expl (bytes) – override default tag with EXPLICIT one

  • default – set default value. Type same as in value

  • optional (bool) – is object OPTIONAL in sequence

property choice

Name of the choice

property value

Value of underlying choice

PrimitiveTypes

class pyderasn.PrimitiveTypes(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

Predefined CHOICE for all generic primitive types

It could be useful for general decoding of some unspecified values:

>>> PrimitiveTypes().decod(hexdec("0403666f6f")).value
OCTET STRING 3 bytes 666f6f
>>> PrimitiveTypes().decod(hexdec("0203123456")).value
INTEGER 1193046

Any

class pyderasn.Any(value=None, expl=None, optional=False, _decoded=(0, 0, 0))

ANY special type

>>> Any(Integer(-123))
ANY INTEGER -123 (0X:7B)
>>> a = Any(OctetString(b"hello world").encode())
ANY 040b68656c6c6f20776f726c64
>>> hexenc(bytes(a))
b'0x040x0bhello world'
__init__(value=None, expl=None, optional=False, _decoded=(0, 0, 0))
Parameters:
  • value – set the value. Either any kind of pyderasn’s ready object, or bytes. Pay attention that no validation is performed if raw binary value is valid TLV, except just tag decoding

  • expl (bytes) – override default tag with EXPLICIT one

  • optional (bool) – is object OPTIONAL in sequence

Constructed types

Sequence

class pyderasn.Sequence(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SEQUENCE structure type

You have to make specification of sequence:

class Extension(Sequence):
    schema = (
        ("extnID", ObjectIdentifier()),
        ("critical", Boolean(default=False)),
        ("extnValue", OctetString()),
    )

Then, you can work with it as with dictionary.

>>> ext = Extension()
>>> Extension().specs
OrderedDict([
    ('extnID', OBJECT IDENTIFIER),
    ('critical', BOOLEAN False OPTIONAL DEFAULT),
    ('extnValue', OCTET STRING),
])
>>> ext["extnID"] = "1.2.3"
Traceback (most recent call last):
pyderasn.InvalidValueType: invalid value type, expected: <class 'pyderasn.ObjectIdentifier'>
>>> ext["extnID"] = ObjectIdentifier("1.2.3")

You can determine if sequence is ready to be encoded:

>>> ext.ready
False
>>> ext.encode()
Traceback (most recent call last):
pyderasn.ObjNotReady: object is not ready: extnValue
>>> ext["extnValue"] = OctetString(b"foobar")
>>> ext.ready
True

Value you want to assign, must have the same type as in corresponding specification, but it can have different tags, optional/default attributes – they will be taken from specification automatically:

class TBSCertificate(Sequence):
    schema = (
        ("version", Version(expl=tag_ctxc(0), default="v1")),
    [...]
>>> tbs = TBSCertificate()
>>> tbs["version"] = Version("v2") # no need to explicitly add ``expl``

Assign None to remove value from sequence.

You can set values in Sequence during its initialization:

>>> AlgorithmIdentifier((
    ("algorithm", ObjectIdentifier("1.2.3")),
    ("parameters", Any(Null()))
))
AlgorithmIdentifier SEQUENCE[algorithm: OBJECT IDENTIFIER 1.2.3; parameters: ANY 0500 OPTIONAL]

You can determine if value exists/set in the sequence and take its value:

>>> "extnID" in ext, "extnValue" in ext, "critical" in ext
(True, True, False)
>>> ext["extnID"]
OBJECT IDENTIFIER 1.2.3

But pay attention that if value has default, then it won’t be (not in) in the sequence (because DEFAULT must not be encoded in DER), but you can read its value:

>>> "critical" in ext, ext["critical"]
(False, BOOLEAN False)
>>> ext["critical"] = Boolean(True)
>>> "critical" in ext, ext["critical"]
(True, BOOLEAN True)

All defaulted values are always optional.

DER prohibits default value encoding and will raise an error if default value is unexpectedly met during decode. If bered context option is set, then no error will be raised, but bered attribute set. You can disable strict defaulted values existence validation by setting "allow_default_values": True context option.

All values with DEFAULT specified are decoded atomically in evgen mode. If DEFAULT value is some kind of SEQUENCE, then it will be yielded as a single element, not disassembled. That is required for DEFAULT existence check.

Two sequences are equal if they have equal specification (schema), implicit/explicit tagging and the same values.

__init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

Set

class pyderasn.Set(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SET structure type

Its usage is identical to pyderasn.Sequence.

DER prohibits unordered values encoding and will raise an error during decode. If bered context option is set, then no error will occur. Also you can disable strict values ordering check by setting "allow_unordered_set": True context option.

__init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SequenceOf

class pyderasn.SequenceOf(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SEQUENCE OF sequence type

For that kind of type you must specify the object it will carry on (bounds are for example here, not required):

class Ints(SequenceOf):
    schema = Integer()
    bounds = (0, 2)
>>> ints = Ints()
>>> ints.append(Integer(123))
>>> ints.append(Integer(234))
>>> ints
Ints SEQUENCE OF[INTEGER 123, INTEGER 234]
>>> [int(i) for i in ints]
[123, 234]
>>> ints.append(Integer(345))
Traceback (most recent call last):
pyderasn.BoundsError: unsatisfied bounds: 0 <= 3 <= 2
>>> ints[1]
INTEGER 234
>>> ints[1] = Integer(345)
>>> ints
Ints SEQUENCE OF[INTEGER 123, INTEGER 345]

You can initialize sequence with preinitialized values:

>>> ints = Ints([Integer(123), Integer(234)])

Also you can use iterator as a value:

>>> ints = Ints(iter(Integer(i) for i in range(1000000)))

And it won’t be iterated until encoding process. Pay attention that bounds and required schema checks are done only during the encoding process in that case! After encode was called, then value is zeroed back to empty list and you have to set it again. That mode is useful mainly with CER encoding mode, where all objects from the iterable will be streamed to the buffer, without copying all of them to memory first.

__init__(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SetOf

class pyderasn.SetOf(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

SET OF sequence type

Its usage is identical to pyderasn.SequenceOf.

__init__(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))

Various

pyderasn.abs_decode_path(decode_path, rel_path)

Create an absolute decode path from current and relative ones

Parameters:
  • decode_path – current decode path, starting point. Tuple of strings

  • rel_path – relative path to decode_path. Tuple of strings. If first tuple’s element is “/”, then treat it as an absolute path, ignoring decode_path as starting point. Also this tuple can contain “..” elements, stripping the leading element from decode_path

>>> abs_decode_path(("foo", "bar"), ("baz", "whatever"))
("foo", "bar", "baz", "whatever")
>>> abs_decode_path(("foo", "bar", "baz"), ("..", "..", "whatever"))
("foo", "whatever")
>>> abs_decode_path(("foo", "bar"), ("/", "baz", "whatever"))
("baz", "whatever")
pyderasn.agg_octet_string(evgens, decode_path, raw, writer)

Aggregate constructed string (OctetString and its derivatives)

Parameters:
  • evgens – iterator of generated events

  • decode_path – points to the string we want to decode

  • raw – slicebable (memoryview, bytearray, etc) with the data evgens are generated on

  • writer – buffer.write where string is going to be saved

  • writer – where string is going to be saved. Must comply with io.RawIOBase.write behaviour

See also

agg_octet_string

pyderasn.ascii_visualize(ba)

Output only ASCII printable characters, like in hexdump -C

Example output for given binary string (right part):

92 2b 39 20 65 91 e6 8e  95 93 1a 58 df 02 78 ea  |.+9 e......X..x.|
                                                   ^^^^^^^^^^^^^^^^
pyderasn.colonize_hex(hexed)

Separate hexadecimal string with colons

pyderasn.encode2pass(obj)

Encode (2-pass mode) to DER in memory buffer

Returns bytes:

memory buffer contents

pyderasn.encode_cer(obj)

Encode to CER in memory buffer

Returns bytes:

memory buffer contents

pyderasn.file_mmaped(fd)

Make mmap-ed memoryview for reading from file

Parameters:

fd – file object

Returns:

memoryview over read-only mmap-ing of the whole file

Warning

It does not work under Windows.

pyderasn.hexenc(data)

Hexadecimal string to binary data convert

pyderasn.hexdec(data)

Binary data to hexadecimal string convert

pyderasn.hexdump(raw)

Generate hexdump -C like output

Rendered example:

00000000  30 80 30 80 a0 80 02 01  02 00 00 02 14 54 a5 18  |0.0..........T..|
00000010  69 ef 8b 3f 15 fd ea ad  bd 47 e0 94 81 6b 06 6a  |i..?.....G...k.j|

Result of that function is a generator of lines, where each line is a list of columns:

[
    [...],
    ["00000010 ", " 69", " ef", " 8b", " 3f", " 15", " fd", " ea", " ad ",
                  " bd", " 47", " e0", " 94", " 81", " 6b", " 06", " 6a ",
                  " |i..?.....G...k.j|"]
    [...],
]
pyderasn.tag_encode(num, klass=0, form=0)

Encode tag to binary form

Parameters:
  • num (int) – tag’s number

  • klass (int) – tag’s class (pyderasn.TagClassUniversal, pyderasn.TagClassContext, pyderasn.TagClassApplication, pyderasn.TagClassPrivate)

  • form (int) – tag’s form (pyderasn.TagFormPrimitive, pyderasn.TagFormConstructed)

pyderasn.tag_decode(tag)

Decode tag from binary form

Warning

No validation is performed, assuming that it has already passed.

It returns tuple with three integers, as pyderasn.tag_encode() accepts.

pyderasn.tag_ctxp(num)

Create CONTEXT PRIMITIVE tag

pyderasn.tag_ctxc(num)

Create CONTEXT CONSTRUCTED tag

class pyderasn.DecodeError(msg='', klass=None, decode_path=(), offset=0)
__init__(msg='', klass=None, decode_path=(), offset=0)
Parameters:
  • msg (str) – reason of decode failing

  • klass – optional exact DecodeError inherited class (like NotEnoughData, TagMismatch, InvalidLength)

  • decode_path – tuple of strings. It contains human readable names of the fields through which decoding process has passed

  • offset (int) – binary offset where failure happened

class pyderasn.NotEnoughData(msg='', klass=None, decode_path=(), offset=0)
class pyderasn.ExceedingData(nbytes)
class pyderasn.LenIndefForm(msg='', klass=None, decode_path=(), offset=0)
class pyderasn.TagMismatch(msg='', klass=None, decode_path=(), offset=0)
class pyderasn.InvalidLength(msg='', klass=None, decode_path=(), offset=0)
class pyderasn.InvalidOID(msg='', klass=None, decode_path=(), offset=0)
class pyderasn.ObjUnknown(name)
class pyderasn.ObjNotReady(name)
class pyderasn.InvalidValueType(expected_types)
class pyderasn.BoundsError(bound_min, value, bound_max)

Command-line usage

You can decode DER/BER files using command line abilities:

$ python -m pyderasn --schema tests.test_crts:Certificate path/to/file

If there is no schema for your file, then you can try parsing it without, but of course IMPLICIT tags will often make it impossible. But result is good enough for the certificate above:

$ python -m pyderasn path/to/file
    0   [1,3,1604]  . >: SEQUENCE OF
    4   [1,3,1453]  . . >: SEQUENCE OF
    8   [0,0,   5]  . . . . >: [0] ANY
                    . . . . . A0:03:02:01:02
   13   [1,1,   3]  . . . . >: INTEGER 61595
   18   [1,1,  13]  . . . . >: SEQUENCE OF
   20   [1,1,   9]  . . . . . . >: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
   31   [1,1,   0]  . . . . . . >: NULL
   33   [1,3, 274]  . . . . >: SEQUENCE OF
   37   [1,1,  11]  . . . . . . >: SET OF
   39   [1,1,   9]  . . . . . . . . >: SEQUENCE OF
   41   [1,1,   3]  . . . . . . . . . . >: OBJECT IDENTIFIER 2.5.4.6
   46   [1,1,   2]  . . . . . . . . . . >: PrintableString PrintableString ES
[...]
 1409   [1,1,  50]  . . . . . . >: SEQUENCE OF
 1411   [1,1,   8]  . . . . . . . . >: OBJECT IDENTIFIER 1.3.6.1.5.5.7.1.1
 1421   [1,1,  38]  . . . . . . . . >: OCTET STRING 38 bytes
                    . . . . . . . . . 30:24:30:22:06:08:2B:06:01:05:05:07:30:01:86:16
                    . . . . . . . . . 68:74:74:70:3A:2F:2F:6F:63:73:70:2E:69:70:73:63
                    . . . . . . . . . 61:2E:63:6F:6D:2F
 1461   [1,1,  13]  . . >: SEQUENCE OF
 1463   [1,1,   9]  . . . . >: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
 1474   [1,1,   0]  . . . . >: NULL
 1476   [1,2, 129]  . . >: BIT STRING 1024 bits
                    . . . 68:EE:79:97:97:DD:3B:EF:16:6A:06:F2:14:9A:6E:CD
                    . . . 9E:12:F7:AA:83:10:BD:D1:7C:98:FA:C7:AE:D4:0E:2C
[...]

Human readable OIDs

If you have got dictionaries with ObjectIdentifiers, like example one from tests/test_crts.py:

stroid2name = {
    "1.2.840.113549.1.1.1": "id-rsaEncryption",
    "1.2.840.113549.1.1.5": "id-sha1WithRSAEncryption",
    [...]
    "2.5.4.10": "id-at-organizationName",
    "2.5.4.11": "id-at-organizationalUnitName",
}

then you can pass it to pretty printer to see human readable OIDs:

$ python -m pyderasn --oids tests.test_crts:stroid2name path/to/file
[...]
   37   [1,1,  11]  . . . . . . >: SET OF
   39   [1,1,   9]  . . . . . . . . >: SEQUENCE OF
   41   [1,1,   3]  . . . . . . . . . . >: OBJECT IDENTIFIER id-at-countryName (2.5.4.6)
   46   [1,1,   2]  . . . . . . . . . . >: PrintableString PrintableString ES
   50   [1,1,  18]  . . . . . . >: SET OF
   52   [1,1,  16]  . . . . . . . . >: SEQUENCE OF
   54   [1,1,   3]  . . . . . . . . . . >: OBJECT IDENTIFIER id-at-stateOrProvinceName (2.5.4.8)
   59   [1,1,   9]  . . . . . . . . . . >: PrintableString PrintableString Barcelona
   70   [1,1,  18]  . . . . . . >: SET OF
   72   [1,1,  16]  . . . . . . . . >: SEQUENCE OF
   74   [1,1,   3]  . . . . . . . . . . >: OBJECT IDENTIFIER id-at-localityName (2.5.4.7)
   79   [1,1,   9]  . . . . . . . . . . >: PrintableString PrintableString Barcelona
[...]

Decode paths

Each decoded element has so-called decode path: sequence of structure names it is passing during the decode process. Each element has its own unique path inside the whole ASN.1 tree. You can print it out with --print-decode-path option:

$ python -m pyderasn --schema path.to:Certificate --print-decode-path path/to/file
   0    [1,3,1604]  Certificate SEQUENCE []
   4    [1,3,1453]   . tbsCertificate: TBSCertificate SEQUENCE [tbsCertificate]
  10-2  [1,1,   1]   . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL [tbsCertificate:version]
  13    [1,1,   3]   . . serialNumber: CertificateSerialNumber INTEGER 61595 [tbsCertificate:serialNumber]
  18    [1,1,  13]   . . signature: AlgorithmIdentifier SEQUENCE [tbsCertificate:signature]
  20    [1,1,   9]   . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5 [tbsCertificate:signature:algorithm]
  31    [0,0,   2]   . . . parameters: [UNIV 5] ANY OPTIONAL [tbsCertificate:signature:parameters]
                     . . . . 05:00
  33    [0,0, 278]   . . issuer: Name CHOICE rdnSequence [tbsCertificate:issuer]
  33    [1,3, 274]   . . . rdnSequence: RDNSequence SEQUENCE OF [tbsCertificate:issuer:rdnSequence]
  37    [1,1,  11]   . . . . 0: RelativeDistinguishedName SET OF [tbsCertificate:issuer:rdnSequence:0]
  39    [1,1,   9]   . . . . . 0: AttributeTypeAndValue SEQUENCE [tbsCertificate:issuer:rdnSequence:0:0]
  41    [1,1,   3]   . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.6 [tbsCertificate:issuer:rdnSequence:0:0:type]
  46    [0,0,   4]   . . . . . . value: [UNIV 19] AttributeValue ANY [tbsCertificate:issuer:rdnSequence:0:0:value]
                     . . . . . . . 13:02:45:53
  46    [1,1,   2]   . . . . . . . DEFINED BY 2.5.4.6: CountryName PrintableString ES [tbsCertificate:issuer:rdnSequence:0:0:value:DEFINED BY 2.5.4.6]
[...]

Now you can print only the specified tree, for example signature algorithm:

$ python -m pyderasn --schema path.to:Certificate --decode-path-only tbsCertificate:signature path/to/file
  18    [1,1,  13]  AlgorithmIdentifier SEQUENCE
  20    [1,1,   9]   . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
  31    [0,0,   2]   . parameters: [UNIV 5] ANY OPTIONAL
                     . . 05:00