Home Development for Android Ways to design databases in Android

Ways to design databases in Android

by admin

As an Android developer I have encountered two different approaches to designing databases for mobile applications. Perhaps some of the stated here will seem obvious, and perhaps someone will get a new idea or prevent errors. Anyway, without long preambles let’s get to the point…

Two ways of looking at the problem

As you know, in universities they teach to build databases according to all the rules: decompose the subject area into entities, select attributes and define primary keys, define relations between entities, bring all this at least to the 3rd normal form, etc. One of the "side effects" of this approach is a drop in performance on read operations, with fairly strong decomposition and normalization, because more joins need to be performed in queries. And the more records you have in your tables, the longer they take to execute.
Add to this the very limited hardware capabilities of mobile platforms, in particular the tiny amount of RAM. It is already scarce, so in addition to this, Android limits the amount of available RAM per process, depending on the OS version, from 16 to 48 MB. And even of these few megabytes the DBMS gets only a fraction, because there is also the application itself. Finally, SQLite itself, due to its characteristics supports only two levels of isolated transactions. They are either serialized or disabled at all!
When application performance begins to be hampered by DBMS performance, an alternative approach, let’s call it key-value-oriented, may come to the rescue. Instead of decomposing an entity into attributes and creating separate table fields for each attribute, the entity is stored "as is" in a single BLOB field, in other words serialized.
Let’s look at an example for clarity. Let our data model in Java code look like this :

class Group {private Long _id;private String number;private List<Student> students;// getters and setters...}class Student {private Long _id;private String name;private String surname;private Group group;// getters and setters...}

Thus, in the "standard" version, we get two tables with the corresponding sets of attributes.

create table Group(_id primary key integer autoincrement, number text);create table Student(_id primary key integer autoincrement, name text, surname text, group_id integer foreign key);

In this project there are much more entities and attributes, plus various service fields are added, such as the date of the last synchronization with the server or flag indicating whether the entity needs to send to the server to update the changed data, etc.
When key-value approach is applied, the tables will look like this

create table Group(_id primary key integer autoincrement, value blob);create table Student(_id primary key integer autoincrement, value blob, group_id integer foreign key);

In this case, groups and students are serialized separately in different tables. Or just like this :

create table Group(_id primary key integer autoincrement, value blob);

when the group is serialized directly with all students in the same table.
Let’s look at the advantages and disadvantages of both approaches and how you can benefit from them.

Comparison of approaches, pros and cons

Possibilities of relational algebra

With the standard approach, we get all the advantages we are so used to with the relational approach, namely the SQL language for easy sampling, filtering and sorting data, as well as modifying the database schema. To retrieve a collection of entities, we just need to form the required condition and fetch our data from the database. In the key-value approach, on the other hand, the task of filtering or ordering the data rests on the developer’s shoulders.

DB file size

When using the standard approach, the database file is usually of a smaller size. This is due to the lack of redundancy in the storage of data, in consequence of normalization. In theory, the higher the degree of normalization, the less redundancy, but increases the load on the database when reading these data. Significant resources are spent on table joins. When using the key-value approach, the degree of data redundancy is higher, because, as a rule, the level of normalization is much smaller, which leads to an increase in the size of the database file.

Flexibility to change the database schema

Usually with the development of the project database schema is transformed more than once, new fields are added, previously used fields are removed, the entities can be split into several new or vice versa, their denormalization and combining several tables into one. If, when updating the scheme, we can sacrifice the accumulated database data, then everything is simple: we create a new database file every time we update the scheme, and delete the old one. But what if the data has to be saved and converted to the new format?
In this case, the standard approach has advantages. It is enough to write the appropriate updat-scripts, which will convert the database schema to the required form and update the new fields with default values or calculate them using some kind of logic. When using serialization, updating the database schema is not such an easy task. It is necessary to transform the schema with all data intact and update the data itself by deserializing it, initializing new fields and serializing it back. This increases both the logical complexity of the operation and the time required for the update.

Synchronizing access to entity instances

One of the major drawbacks of the key-value approach, it seems to me, is that in order to change just one field in an entity we need to deserialize the whole object. This makes accessing objects much more difficult. For example, in the case where a group is serialized into the database along with all the students, in order to change the last name of one of the students, we need to take the entire group out of the database, change one last name, and save it back. If the application has multiple threads, services, and/or content providers that can work with the same entities, the task gets a lot more complicated. The more potential "writers" there are, the more blocking will occur and the harder it will be for us to provide synchronized access to the entities. In the case of the standard approach, this problem is solved at the DBMS level.

Performance

On the one hand, the key-value approach allows for better performance when fetching small amounts of data. The number of joins is reduced, a particular query and the DBMS as a whole works faster. On the other hand, with large amounts of data, if we need to filter or sort them by a field that is serialized along with the whole object, to perform this operation we must first read all entities, and only then filter out all excess, which may lead not to gain in performance, but to its even greater deterioration. Alternatively, you can store fields involved in filtering or sorting query with standard approach and the rest of entity as a BLOB, but such mess will be hard to maintain.

Code volume

In the standard approach increases the amount of SQL code, various scripts creating and modifying the database schema, queries and conditions, DAO-objects, etc. In the key-value, the amount of such code is reduced, but increases the amount of code performing various sorting, grouping and filtering by condition, because all this has to be done "manually", when the standard approach, it does the database, and we only need to write the required query.

Serialization

The downside of the key-value approach may be a performance hit due to using standard Java serialization / deserialization, which is notoriously slow. Here as an alternative you can use one of the libraries solving this problem, for example protobuf from Google. In addition to speed, an additional plus, in the case of using protobuf, will be versioning, because this protocol supports object versioning.

Conclusion

It’s a bit chaotic, but in general I would like to say that both approaches are good, you need to choose according to the situation, considering all the listed pros and cons. As a rule, if there are no performance problems, it is better to use the standard approach, which has more flexibility. If these problems begin to occur, try using denormalization. Perhaps if there are only a few critical sections in the program, this can solve everything. If you encounter repeated performance problems when denormalization no longer saves you, you should take a closer look at the key-value approach.

You may also like