Databases

Jonathon

Database:

An organized collection of related data.

A database could be a collection of alphabetized paper files.

An electronic database uses software to store data in an organized way.

The software allows the user to update, view and search data.

Why not just store the database your program needs in a text file?

Problems with text files

  • Speed when database (file) becomes large.
  • Only one user can update file at a time.
  • Only simple data structures can be (easily) implemented.
  • Hard to share data.
  • Every access function must be written by hand.

Database software solves all these problems and solves problems that do not arise until the database becomes large.

There are many different kinds of software systems that call themselves databases. For the purpose of this lecture, I will limit the material to relational databases.

Relational Databases use Tables to Store Data

Course Name Number Dept Credits
Intro to Python CAP101 CS 3
Stat Methods I STA511 MA 3
Intro to Algorithms COT401 CS 3


A conceptual relational database table is shown above.

Each row in a database represents a unique record in the table. Each column in a row is called a field. Each field can hold different types of data.

The rows of a database table are different from each other. Rows are uniquely identified by a value called a primary key. In this example, the Course Number is the primary key.

A relational database typically consists of multiple tables. The tables are related by identical column values. The primary key of one table can be listed in the column of another table. When this occurs, this value is called a foreign key. Use of a foreign key allows the linking of tables together.

Course Name Number Dept Credits
Intro to Python CAP101 CS 3
Stat Methods I STA511 MA 3
Intro to Algorithms COT401 CS 3


StudentID Term Number
S0011 F13 CAP101
S0012 F13 STA511
S0099 F13 COT401


In this case the Course Number is a foreign key in the student schedule table.

In the previous example, the keys are human-readable strings. In modern Relational DataBase Management Systems (RDBMS), keys are auto-generated.

Would you want to create by hand a set of unique identifiers for a million row table?

Relational databases are built to not have redundant data. For a university database, the student personal information would be stored in one table. The primary key of that table would link any and all data in that table to related tables such as Course Schedule or Final Grades. The student's name, for example, would not be repeated in these tables.

Why would you NOT want to repeat the same data in multiple tables?

The degree to which databases succeed in preventing redundancy is captured in a set of definition called "Normal Forms". Further discussion is outside the scope of the course.

For some applications, the database will intentionally have tables with redundant data. This is called de-normalized data.

Why might an organization want to do this?

Using a database

The software developer will interact with the database via a management tool, a library or for precise work, Structured Query Language (SQL).

In a language like Python, you write the step by step procedure of what you want the program to do. These languages are called procedual languages. SQL is different. Typically, you write code or a statement that describes the result you wish to achieve. Such languages are called declarative languages.

Some SQL

SELECT student_name, phone_number FROM student_info

UPDATE student_info
  SET phone_number = '03292524583'
  WHERE student_id = 'S00019'

SELECT student.student_name, schedule.course_number
  FROM student_info
  INNER JOIN schedule
  ON student.student_id = schedule.student_id
  ORDER BY student.student_name


SQL is a rich language and has been in use for over 40 years. Learning SQL is a separate course from Python.

Advantages of using an RDBMS

  • Scalability
  • Distributed (multiple users can access over network)
  • Reduced redundancy of data
  • Backup and recovery
  • Transactions (logging of who made what changes when)
  • Fine-grained security
  • Can represent complex structures
  • Complex queries/reports possible.

Python and DB

The Python Database API (DB-API) defines a standard interface for Python database access modules. It’s documented in PEP 249. Most modern databases support PEP 249.

Example using MySQL

#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb as mdb

con = mdb.connect('localhost', 'testuser', 'test623', 'testdb');

with con:
    cur = con.cursor()
    cur.execute("SELECT * FROM Writers")
    rows = cur.fetchall()
    for row in rows:
        print row

con.close()

Questions?????