Voluptuous Python Library An Aid for Data Validation

Python libraries are reusable sets of code that we can include in our program without writing the entire code. There are over 137,000 libraries in python like Tensorflow, Numpy, Keras, PyTorch, Scikit-Learn, and the voluptuous python library.

A voluptuous python library is a library that is used for data validation. The main purpose of voluptuous in python is to validate data present in python, which is different types such as JSON and XML.

What is Data Validation?

Before we import modules or perform preprocessing in a program, it is essential to perform data validation. Data validation is a crucial step that evaluates the quality and accuracy of data. It ensures that the data present is clean and accurate. Basically, to see that the input given is correct. We perform data validation to ensure that our results are accurate when we are using that data for analysis.

Example:

  • Data should be free from null values
  • The range of values should be consistent
  • The records stored should be distinct and unique
  • The data type is valid
  • Meets the required constraints

JSON and XML

XML:

XML stands for eXtensible Markup Language. It is a markup language that comes in handy when we have a small amount of data that we want to work on without using a SQL database. Python is a popular language used in website designing and data analysis. So, there is a high chance that one would have to work with XML data. Python has two main interfaces which are used to work with XML data. The two APIs are Simple API for XML (SAX) and Document Object Model API (DOM).

JSON:

JSON stands for JavaScript Object Notation. There is a built-in package present in Python – JSON – which is used for working with JSON data. JSON is a data format in python which is used to represent structured data. The main application of JSON data format comes into use while communicating between a web application and server. In python, JSON is stored in the form of a string. Its main use is transferring data over a network in the form of text. Like a python dictionary, JSON represents objects in the form of key-value pairs.

Voluptuous as a python data validation librar

Voluptuous is a python data validation library used mainly for validating data in python, such as JSON, XML.

Three main goals of voluptuous library are:

  • Simplifying the data
  • Providing support while handling complex data structures
  • Providing useful error messages

While handling data, it is important to accept and use only that required data and discarding the rest. Also, while fetching data from the database, we don’t always know about the type of data we will receive. So, for solving the above problems, voluptuous comes into use. It enables users only to accept data of a particular format.

Why prefer voluptuous over other python validation libraries?

There are so many validation libraries present in python. Below are the points signifying why it advantageous to use voluptuous over others.

  • Simple to use. There is no need to create a subclass. Only a function has to be used
  • Accepts basic Python data structures
  • Errors are easy to handle by calling exceptions
  • Provides consistency
  • Nested data structures are treated like other data types.

Example of voluptuous library

First, we have to install the Voluptuous Library before importing.

pip install voluptuous

Now, we shall import the Voluptuous Library. Next, we shall import an interface ‘Schema’ from voluptuous.

from voluptuous import Schema

According to the python documentation, the schema is a Python tree-like structure where nodes are pattern matched against corresponding trees of values.

Here, nodes can be values, types, or callables. In other words, the schema is the data structure that we are expecting. It can be an array of integers or a dictionary, or an array of dictionaries.

Let us consider a schema for a school record.

 schema = Schema({
   'name': str,
   'sid': int,
   'marks': int,
 })

So here, the above-given schema describes the data required by the API. But, there are some problems in this schema too. For example, it does not mention all the constraints of the API. Thus, we have to define another schema for the constraints. Here, the value name should have a maximum length of 20, sid should have a maximum length of 3, and the minimum value of marks should be 60. So now, we shall define the schema more precisely.

First, we shall have to import Required, All, Length, Range from voluptuous.

from voluptuous import Required, All, Length, Range

So now, we shall define the new schema.

schema = Schema({
   Required('name'): All(str, Length(min=1, max=20)),
   'sid': All(int, Range(min=1, max=20)),
   'marks': All(int, Range(min=60)),
})

We shall create an Exception. Since the name is defined as a string value, we shall raise an exception if the value entered is not a string.

try:
  schema({'name': 903})
  raise AssertionError('MultipleInvalid not raised')
except MultipleInvalid as e:
  exc = e
str(exc) == "Expected string value for dictionary value @ data['name']"

Similarly, we create another exception because the name must be at least one character in length

 try:
   schema({'name': ''})
   raise AssertionError('MultipleInvalid not raised')
 except MultipleInvalid as e:
   exc = e
   str(exc) == "length of value must be at least 1 for dictionary value @ data['name']"

Summary

So, in this article we covered:

  • What is Data Validation
  • JSON and XML
  • Introduction to voluptuous
  • Advantages of Voluptuous Python Library
  • Example of Voluptuous Python Library

FAQ’s About Voluptuous Python Library

What is Cerberus in Python?

Cerberus in python is a powerful yet simple library. It is used for data validation. It can be easily extended thus enabling users to create custom validation.

What are other Data Validation libraries in python?

Apart from voluptuous, there are other libraries in python for data validation. Some of them are cerberus, colandar, schema, valideer and schematics.

What is a validator in python?

The validator a library in python which provides more than 60 functions to validate the type and contents of an input value.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments