Future Directions¶
Strategic¶
See Model Definition for a list of features.
- Additional string formats and patterns. 
- Additional numeric distributions. 
- Additional date, datetime, and time features. 
- Enumerated value with a distribution histogram. 
- Optional values are a more subtle aspect of a domain definition. - A domain-indepedent null is the SQL - NULLor Python- Nonevalue. This can be done with a JSONSchema- oneOfand a- json_schema_extrato provide probability of a- null.- More generally, it requires a - oneOfwith probabilities for each alternative. This leads to a- Annnotated[Union[int, None, ...], etc.]with probabilities for the two alternatives.
- A domain-specific null is a coded value, like social security number - 999-99-9999that indicates some sort of missing or not-applicable value. This is also a complicated- Union. This leads to a- Union[Annnotated[int, ...], Annotated[Literal[n], ...], etc.]with probabilities for each choice.
 
Tactical¶
Todo
Reduce reliance on the Pydantic FieldInfo and annotation classes.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base, line 48.)
Todo
Handle recursive structures here.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.DataIter, line 7.)
Todo
Handle recursive structures here.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.BaseModelSynthesizer, line 6.)
Todo
Handle nested models.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.BaseModelSynthesizer.__init__, line 11.)
Todo
FK’s may have optionality rules.
The SynthesizeReference instance may need a subdomain distribution.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.BaseModelSynthesizer.sql_rule, line 4.)
Todo
What kind of error for invalid values?
For now, we simply create an Independent behavior. Perhaps a ValueError is better? Or a wanrning?
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.BaseModelSynthesizer.sql_rule, line 8.)
Todo
Confirm all alternatives defined in synth_class_map.
A None in  synth_class_map means an unknown synth, possibly buried in a Union.
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/base.py:docstring of synthdata.base.BaseModelSynthesizer.make_field_synth, line 16.)
Todo
Improve the name generator with better pattern (and anti-pattern).
Options
- get first names from census data; get digraph frequency from last names. 
- Use NLTK digraph frequencies to generate plausible English-like works. 
(The original entry is located in /Users/slott/github/local/DataSynthTool/docs/../src/synthdata/synths.py:docstring of synthdata.synths.SynthesizeName, line 3.)