Generating Database Schema With SQL and GraphViz

Tue 20 October 2009

At work recently I've been asked to dig into a legacy app running queries against Oracle. One of the challenges with projects like this is determining the existing schema and how the tables are related. Typically you can read the internal documentation or review the existing code, but in this case it seems the documentation was never written, and nobody knows how to build the client app or which copy of the source code is in use. The best option here is to query the database itself for this information. For those playing at home, Oracle is not required; in fact, even SQLite has sufficient instrumentation. If you use Firefox or Liferea or Banshee you will have an SQLite database to inspect.

There are popular DB admin tools in Ubuntu like TOra, but they lack a decent diagram generator. If we can extract the schema from the DB, it should be straightforward to pass this off to GraphViz. In Oracle, the Oracle Data Dictionary tables provide you with all features to an Oracle schema. In SQLite, schemas are stored in the table SQLITE_MASTER. You can easily retrieve it with a command like:

echo ".schema" | sqlite3 ~/.liferea_1.4/liferea.db >> liferea.sql

Since this information is readily available on the web, it's inevitable someone on the internet has done the obvious and wrote a translator from SQL schema to DOT. I'll share two that I've found that are open source and useful: SchemaSpy and SQLFairy.


SchemaSpy is a Java based tool to connect to a database and generate documentation. It uses JDBC, and outputs a wealth of information in HTML and PNG. In addition to the ER diagram basics, it captures constraints, table sizes, and offers some rudimentary static analysis of the schema. For example, if you have a table named students and a column named student_id, it will suggest this as an implied primary key if it's in the same table, or a foreign key if it's outside the student table. It generates ER diagrams with and without these implied relations, and uses AJAX to switch between generated images. Check out the sample output to see exactly what it generates. The Windows builds of graphViz appear to do a poor job of rendering, but I like the color coding of indexes and keys, and how the lines relate specific fields:

schema diagram for library database

The above diagram format is fairly handy (but rendered poorly; Ubuntu's graphviz does a better job) when the schema is rational. In my case, there were substantial missing relations and missing primary keys. It was bad enough that even the implied relations are wrong. Fortunately, SchemaSpy leaves the intermediary graphviz DOT files around, so I can go in and fix the output for a few tables. It even provides a -meta parameter for cases like this, but it's easy to fix one table diagram and then fix the actual problem in the schema. Overall, it's a success for my Oracle DB reverse engineering task.

The output is fancy enough that I'm tempted to try it on the personal databases I have. SchemaSpy relies on JDBC, and I've not been able to locate a JDBC driver for SQLite that handles the metadata requests needed. Plus, SQLite is special among DBs, in that its goal is to be the simplest thing that could possibly work. This places the analysis steps somewhere between pointless and impossible. In this case -meta might be handy for generating documentation without having both triggers and foreign key constraints in the schema. If you can find a driver to load it in the first place.

SQL Fairy

SQLite, though simple, is becoming more popular in desktop apps, so it'd be productive to have a tool to document these on-disk formats. I know when I was chasing enclosure handling in Liferea, documentation about the internal schema would have saved time. But db.c was the second largest source file, behind only a SWIG autogenerated header file to wrap scripting languages. Generating readable-at-a-glance documentation would make it easier to see how enclosures are handled within Liferea.

So while I can't convince SchemaSpy to hit up SQLite databases, I know what I want is possible, and easy enough that surely someone's done it. This is the Internet, after all. A fair amount of searching revealed SQLFairy; it's listed on the Graphviz Resources, to my chagrin. SQLFairy is a set of Perl scripts with a main goal of manipulating schemas: translating, diffing, and diagramming. It's not much, but it does do the bare minimum: generate a diagram. It won't flag poorly built schemas or summarize table sizes, but it does translate tables and relationships into DOT and hand them off to GraphViz.

SQLite presents some unique challenges beyond lacking a JDBC driver; it only started enforcing foreign keys a few days ago, so software needing this today works around that limitation with triggers. Fortunately, SQLFairy also supports implied relationships, but they call them "natural joins", overloading normal DB terminology. Using the liferea.sql schema dump, we can use sqlt-graph to generate an SVG for high quality printing:

sqlt-graph -c --natural-join --from=SQLite -t svg -o liferea_schema.svg liferea.sql

schema diagram for liferea database

You can then load that up in Inkscape and target whatever paper you've got or tweak the diagram. We have some 11x17 ledger paper at work that showcases these diagrams very nicely. Since SVG output is also XML, you could run some XSLT output processing to style it, but without more metadata in the XML, about all I can get is drop shadows on boxes. It also doesn't do relations very well, drawing relations between tables rather than between fields within tables, because it doesn't properly make use of Graphviz records. As bad as the above graph is, it's worse on Ubuntu 9.04 and 9.10, which carry versions from before SQLFairy upstream revisited the graphviz output. There's no reason this can't be fixed to be close to SchemaSpy quality diagrams, although the two projects use different (possibly compatible) licensing.


Right now it seems like SchemaSpy is a great tool for documenting your server oriented database; if you generate documentation for your project it's worth having a look at adding it to the doc target. On the other hand, SQLite support is not as robust, even as adoption is growing. The newly announced FK support gives me hope that most apps can be easily changed to be more documentation amenable. If anyone can get SchemaSpy to work with SQLite, please let me know how!

Comments !