Attention
This update features:
Support for model-to-model conversion.
Support for attrs and sqlalchemy (integration with many other libraries is coming).
Fully redesigned API helping to follow DRY.
Performance improvements of up to two times.
Extended usage
You can configure the factory during its creation. You can’t change the settings later because they affect parsers, which are created only once for each instance of a factory.
Most of the configuration is done via Schemas. You can set default schema or one per type:
factory = Factory(default_schema=Schema(...), schemas={ClassA: Schema(...)})
More verbose errors
Currently, errors are not very verbose. But you can make them a bit better using debug_path
of a factory.
It is disabled by default because affects performance.
In this mode InvalidFieldError
is thrown when some dataclass field cannot be parsed.
It contains field_path
which is a path to the field in provided data (key and indexes).
Working with field names
Name mapping
In some cases, you have json with keys that leave much to be desired. For example, they might contain spaces or just have unclear meanings. The simplest way to fix it is to set a custom name mapping. You can call fields as you want and the factory will translate them using your mapping.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
data = {
"title": "Fahrenheit 451",
"book price": 100,
}
book_schema = Schema(name_mapping={
"price": "book price"
})
factory = dataclass_factory.Factory(schemas={Book: book_schema})
book: Book = factory.load(data, Book)
serialized = factory.dump(book)
Fields absent in mapping are not translated and used with their original names (as in dataclass specification).
Stripping underscore
It is often unnecessary to fill name mapping. One of the most common cases is dictionary keys which are python keywords.
For example, you cannot use the string from
as a field name, but it is very likely to see in APIs. Usually, it is solved by adding a trailing underscore (e.g. from_
).
Dataclass factory will trim trailing underscores so you won’t meet this case.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Period:
from_: int
to_: int
data = {
"from": 1,
"to": 100,
}
factory = dataclass_factory.Factory()
period = factory.load(data, Period)
Sometimes this behavior is unwanted, so you can disable this feature by setting trim_trailing_underscore=False
in Schema (in default schema of the concrete one).
Also, you can re-enable it for certain types.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Period:
from_: int
to_: int
data = {
"from_": 1,
"to_": 100,
}
factory = dataclass_factory.Factory(default_schema=Schema(trim_trailing_underscore=False))
period = factory.load(data, Period)
factory.dump(period)
Name styles
Sometimes json keys are quite normal but ugly. For example, they are named using CamelCase, but PEP8 recommends you to use snake_case. Of cause, you can prepare name mapping, but it is too much to write for such a stupid thing.
The library can translate such names automatically. You need to declare fields as recommended by PEP8 (e.g. field_name) and set corresponding name_style
.
As usual, if no style is set for a certain type, it will be taken from the default schema.
By the way, you cannot convert names that do not follow snake_case style. In this case, the only valid style is ignore
from dataclasses import dataclass
from dataclass_factory import Factory, Schema, NameStyle
factory = Factory(default_schema=Schema(
name_style=NameStyle.camel
))
@dataclass
class Person:
first_name: str
last_name: str
person = Person("ivan", "petrov")
serial_person = {
"FirstName": "ivan",
"LastName": "petrov"
}
assert factory.dump(person) == serial_person
Following name styles are supported:
snake
(snake_case)kebab
(kebab-case)camel_lower
(camelCaseLower)camel
(CamelCase)lower
(lowercase)upper
(UPPERCASE)upper_snake
(UPPER_SNAKE_CASE)camel_snake
(Camel_Snake)dot
(dot.case)camel_dot
(Camel.Dot)upper_dot
(UPPER.DOT)ignore
(not real style, but just does no conversion)
Selecting and skipping fields
You have several ways to skip processing of some fields.
Note
Skipped fields MUST NOT be required in class constructor, otherwise parsing will fail
Only and exclude
If you know exactly what fields must be parsed/serialized and want to ignore all others just set them as only
parameter of schema.
Also, you can provide a list with excluded names via exclude
.
It affects both parsing and serializing.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
extra: str = ""
data = {
"title": "Fahrenheit 451",
"price": 100,
"extra": "some extra string"
}
# using `only`:
factory = dataclass_factory.Factory(schemas={Book: Schema(only=["title", "price"])})
book: Book = factory.load(data, Book) # Same as Book(title="Fahrenheit 451", price=100)
serialized = factory.dump(book) # no `extra` key will be in serialized
# using `exclude`
factory = dataclass_factory.Factory(schemas={Book: Schema(exclude=["extra"])})
book: Book = factory.load(data, Book) # Same as Book(title="Fahrenheit 451", price=100)
serialized = factory.dump(book) # no `extra` key will be in serialized
Only mapped
Already have name_mapping
and do not want to repeat all names in only
parameter? Just set only_mapped=True
. It will ignore all fields which are not described in name mapping.
Skip Internal
More simplified case is to skip so-called internal use fields, those fields which name starts with underscore.
You can skip them from parsing and serialization using skip_internal
option of schema.
It is disabled by default. It affects both parsing and serializing.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
_total: int = 0
data = {
"title": "Fahrenheit 451",
"price": 100,
"_total": 1000,
}
factory = dataclass_factory.Factory(default_schema=Schema(skip_internal=True))
book: Book = factory.load(data, Book) # Same as Book(title="Fahrenheit 451", price=100)
serialized = factory.dump(book) # no `_total` key will be produced
Omit default
If you have defaults for some fields, it is unnecessary to store them in serialized representation. For example, this may be None
, empty list or something else.
You can omit them when serializing using omit_default
option. Those values that are equal to default, will be stripped from the resulting dict.
It is disabled by default. It affects only serialising.
from typing import Optional, List
from dataclasses import dataclass, field
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: Optional[int] = None
authors: List[str] = field(default_factory=list)
data = {
"title": "Fahrenheit 451",
}
factory = dataclass_factory.Factory(default_schema=Schema(omit_default=True))
book = Book(title="Fahrenheit 451", price=None, authors=[])
serialized = factory.dump(book) # no `price` and `authors` key will be produced
assert data == serialized
Structure flattening
Another case of ugly API is a too complex hierarchy of data. You can fix it using already known name_mapping
.
Earlier, you used it to rename fields, but also you can use it to map a name to a nested value by specifying a path to it.
Integers in the path are treated as list indices, strings - as dict keys. It affects parsing and serializing.
For example, you have an author of a book with only field - name (see Nested objects). You can expand this dict and store the author name directly in your Book class.
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
author: str
data = {
"title": "Fahrenheit 451",
"price": 100,
"author": {
"name": "Ray Bradbury"
}
}
book_schema = Schema(
name_mapping={
"author": ("author", "name")
}
)
factory = dataclass_factory.Factory(schemas={Book: book_schema})
# Book(title="Fahrenheit 451", price=100, author="Ray Bradbury")
book: Book = factory.load(data, Book)
serialized = factory.dump(book)
assert serialized == data
We can modify example above to store author as a list with name
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
author: str
data = {
"title": "Fahrenheit 451",
"price": 100,
"author": ["Ray Bradbury"]
}
book_schema = Schema(
name_mapping={
"author": ("author", 0)
}
)
factory = dataclass_factory.Factory(schemas={Book: book_schema})
# Book(title="Fahrenheit 451", price=100, author="Ray Bradbury")
book: Book = factory.load(data, Book)
serialized = factory.dump(book)
assert serialized == data
Automatic naming during flattening
If names somewhere in “complex” structure are the same, as in your class you can simplify your schema using ellipsis (...
).
There are two simple rules:
...
as as a key inname_mapping
means Any field. Path will be applied to every field that is not declared explicitly in mapping...
inside path inname_mapping
means that original name of field will be reused. If name style or other rules are provided the will be applied to the name.
Examples:
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
author: str
data = {
"book": {
"title": "Fahrenheit 451",
"price": 100,
},
"author": {
"name": "Ray Bradbury"
}
}
book_schema = Schema(
name_mapping={
"author": (..., "name"),
...: ("book", ...)
}
)
factory = dataclass_factory.Factory(schemas={Book: book_schema})
# Book(title="Fahrenheit 451", price=100, author="Ray Bradbury")
book: Book = factory.load(data, Book)
serialized = factory.dump(book)
assert serialized == data
Parsing unknown fields
By default, all extra fields that are absent in the target structure are ignored. But this behavior is not necessary.
For now, you can select from several variants setting unknown
attribute of Schema
Unknown.SKIP
- default behavior. All unknown fields are ignored (skipped)Unknown.FORBID
-UnknownFieldsError
is raised in case of any unknown field is foundUnknown.STORE
- all unknown fields passed unparsed to the constructor of a class. Your__init__
must be ready for thisField name (
str
). The specified field is filled with all unknowns and the parser of the corresponding type is called. For simple cases, you can annotate that field withDict
type. In the case of serialization, this field is also serialized and the result is merged up with the current result.Several field names (sequence of
str
). The behavior is very similar to the case with one field name. All unknowns are collected to a single dict and it is passed to parsers of each provided field (be careful modifying data atpre_parse
step). Also, their dump results are merged when serializing
from typing import Optional, Dict
from dataclasses import dataclass
from dataclass_factory import Factory, Schema
@dataclass
class Sub:
b: str
@dataclass
class Data:
a: str
unknown: Optional[Dict] = None
sub: Optional[Sub] = None
serialized = {
"a": "A1",
"b": "B2",
"c": "C3",
}
factory = Factory(default_schema=Schema(unknown=["unknown", "sub"]))
data = factory.load(serialized, Data)
assert data == Data(a="A1", unknown={"b": "B2", "c": "C3"}, sub=Sub("B2"))
Additional steps
Most of the work is done automatically, but you may want to do some additional processing.
Real parsing process has following flow:
╔══════╗ ┌───────────┐ ┌────────┐ ┌────────────┐ ╔════════╗
║ data ║ ---> │ pre_parse │ ---> │ parser │ ---> │ post_parse │ ---> ║ result ║
╚══════╝ └───────────┘ └────────┘ └────────────┘ ╚════════╝
The same is for serializing:
╔══════╗ ┌───────────────┐ ┌────────────┐ ┌────────────────┐ ╔════════╗
║ data ║ ---> │ pre_serialize │ ---> │ serializer │ ---> │ post_serialize │ ---> ║ result ║
╚══════╝ └───────────────┘ └────────────┘ └────────────────┘ ╚════════╝
So the return value of pre_parse
is passed to parser
, and return value of post_parse
is used as the total result of the parsing process.
You can add your logic at any step, but mind the main difference:
pre_parse
andpost_serialize
work with serialized representation of data (e.g. dict for dataclasses)post_parse
andpre_serialize
work with instances of your classes.
So if you want to do some validation - it is better to do at post_parse
step.
And if you want to do polymorphic parsing - check if a type is suitable before parsing is started at pre_parse
.
Another case is to change the representation of some fields: serialize json to string, split values and so on.
import json
from typing import List
from dataclasses import dataclass
from dataclass_factory import Schema, Factory
@dataclass
class Data:
items: List[str]
name: str
def post_serialize(data):
data["items"] = json.dumps(data["items"])
return data
def pre_parse(data):
data["items"] = json.loads(data["items"])
return data
def post_parse(data: Data) -> Data:
if not data.name:
raise ValueError("Name must not be empty")
return data
data_schema = Schema[Data](
post_serialize=post_serialize,
pre_parse=pre_parse,
post_parse=post_parse,
)
factory = Factory(schemas={Data: data_schema})
data = Data(['a', 'b'], 'My Name')
serialized = {'items': '["a", "b"]', 'name': 'My Name'}
assert factory.dump(data) == serialized
assert factory.load(serialized, Data) == data
try:
factory.load({'items': '[]', 'name': ''}, Data)
except ValueError as e:
print("Error detected:", e) # Error detected: Name must not be empty
Schema inheritance
In some cases, it might be useful to subclass Schema instead of just creating instances normally.
from typing import Any
from dataclasses import dataclass
import dataclass_factory
from dataclass_factory import Schema
@dataclass
class Book:
title: str
price: int
_author: str = "Unknown author"
data = {
"title": "Fahrenheit 451",
"price": 100,
}
class DataSchema(Schema[Any]):
skip_internal = True
def post_parse(self, data):
print("parsing done")
return data
factory = dataclass_factory.Factory(schemas={Book: DataSchema(trim_trailing_underscore=False)})
book: Book = factory.load(data, Book) # Same as Book(title="Fahrenheit 451", price=100)
serialized = factory.dump(book)
Note
In versions <2.9: Factory created a copy of a schema for each type filling in missed args.
If you need to get access to some data in schema, get a working instance of the schema with Factory.schema
method
Note
Single schema instance can be used multiple time simultaneously because of multithreading or recursive structures. Be careful modifying data in the schema
Json-schema
You can generate json schema for your classes.
Note that factory does it lazily and caches the result. So, if you need definitions for all of your classes, create schema for each top-level class using the json_schema
method
and then collect all definitions using json_schema_definitions
import json
from enum import Enum
from typing import Dict, Union
from dataclasses import dataclass, field
from dataclass_factory import Factory, Schema
class A(Enum):
X = "x"
Y = 1
@dataclass
class Data:
a: A
dict_: Dict[str, Union[int, float]]
dictw_: Dict[str, Union[int, float]] = field(default_factory=dict)
optional_num: int = 0
factory = Factory(schemas={A: Schema(description="My super `A` class")})
print(json.dumps(factory.json_schema(Data), indent=2))
print(json.dumps(factory.json_schema_definitions(), indent=2))
Result of json_schema
call is
{
"title": "Data",
"type": "object",
"properties": {
"a": {
"$ref": "#/definitions/A"
},
"dict": {
"$ref": "#/definitions/typing.Dict[str, typing.Union[int, float]]"
},
"dictw": {
"$ref": "#/definitions/typing.Dict[str, typing.Union[int, float]]",
"default": {}
},
"optional_num": {
"type": "integer",
"default": 0
}
},
"additionalProperties": true,
"required": [
"a",
"dict"
]
}
Result of json_schema_definitions
call is
{
"definitions": {
"A": {
"title": "A",
"description": "My super `A` class",
"enum": [
"x",
1
]
},
"typing.Union[int, float]": {
"title": "typing.Union[int, float]",
"anyOf": [
{
"type": "integer"
},
{
"type": "number"
}
]
},
"typing.Dict[str, typing.Union[int, float]]": {
"title": "typing.Dict[str, typing.Union[int, float]]",
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/typing.Union[int, float]"
}
},
"Data": {
"title": "Data",
"type": "object",
"properties": {
"a": {
"$ref": "#/definitions/A"
},
"dict": {
"$ref": "#/definitions/typing.Dict[str, typing.Union[int, float]]"
},
"dictw": {
"$ref": "#/definitions/typing.Dict[str, typing.Union[int, float]]",
"default": {}
},
"optional_num": {
"type": "integer",
"default": 0
}
},
"additionalProperties": true,
"required": [
"a",
"dict"
]
}
}
}
Note
Not all features of dataclass factory are supported currently. You cannot generate json-schema if you use structure-flattening, additional parsing of unknown fields or init-based parsing. Also, if you have custom parsers or pre-parse step, schema might be incorrect.