How to Normalize your data - Concept
So what's all this about Normalizing my data, and not
using Flat file format that I keep hearing about. This section will give you a
no nonsense explanation of what this all means for the absolute beginner.
What's a flat file? Think of an excel spreadsheet. You may have a
sheet containing details of the number of hours someone has worked on a
particular project each week or month. In excel it may be laid out like this.
| Person
| Department
| Project 1 Jan
| Project 2 Jan
| Project 3 Jan
| Project 1 Feb
| Project 2 Feb
| Project 3 Feb
|
| John |
Design |
64 |
3 |
16 |
7 |
14 |
23 |
| Fred |
Sales |
4 |
2 |
5 |
12 |
0 |
3 |
| Sarah |
Admin |
54 |
6 |
13 |
19 |
8 |
10 |
| James |
Design |
23 |
7 |
13 |
8 |
3 |
13 |
This is a flat format, and at one time every body has probably used it until
the point when they got to December and thought this sheet is getting a bit
wide, I know I'll start a second sheet in Excel and call it 2001. All very
inefficient. What happens if someone says what percentage of hours was worked by
the Design team during Feb on Project 2 you would calculate it by hand. Not
clever or impressive.
You could also put the data into Access in exactly the same format in a
table. It would not be any better. It would be just as useless and time
consuming. Lets normalize it.
What's Normalization? Think of it as a way of splitting your data into
components. Each component links to other related components. This is what gives
us our relational database design.
Why would we want to split stuff up? What happens five years down the
line when you have 500 staff and 2000 projects. Your spreadsheet is very unwieldy
now.
How do we Normalize our Data? Look at the sheet above what components
do we have? There are three immediate ones. Staff, Projects, and Hours worked.
So lets split them into three tables.
tbl_Staff Contains our staff personnel info.
| FirstName
| LastName
| Address
| StaffID
| Level
| Department
|
| John |
Hawkins |
Somewhere Close |
1 |
President |
Design |
| Fred |
Bloggs |
Anywhere Place |
2 |
Vice President |
Sales |
| Sarah |
Hawkins |
Thingy Street |
3 |
Assistant Vice President |
Admin |
| James |
Khan |
Wotsit Road |
4 |
DBA |
Design |
tbl_Projects Contains all the details about the projects
| ProjectName
| Company
| ProjectID
| CompleteDate
| Cost
| Comments
|
| WebSite |
Fabalou Web |
1 |
23/12/2000 |
$1000 |
In progress |
| FireFox2 |
BAA |
2 |
14/3/2000 |
$32,000 |
In bug test |
| WhiteSnake |
Epic |
3 |
5/6/2000 |
$21 |
On back burner |
tbl_HoursWorked This is the linking table between staff and projects
and contains one record for each person per month per project
| StaffID
| ProjectID
| Month
| Hours
|
| 1 |
1 |
Jan |
64 |
| 2 |
1 |
Jan |
4 |
| 3 |
1 |
Jan |
54 |
| 4 |
1 |
Jan |
23 |
| 1 |
2 |
Jan |
3 |
| 2 |
2 |
Jan |
2 |
| 3 |
2 |
Jan |
6 |
| 4 |
2 |
Jan |
7 |
| 1 |
3 |
Jan |
16 |
| 2 |
3 |
Jan |
5 |
| 3 |
3 |
Jan |
13 |
| 4 |
3 |
Jan |
13 |
| 1 |
1 |
Feb |
7 |
| 2 |
1 |
Feb |
12 |
| 3 |
1 |
Feb |
19 |
| 4 |
1 |
Feb |
8 |
| 1 |
2 |
Feb |
14 |
| 2 |
2 |
Feb |
0 |
| 3 |
2 |
Feb |
8 |
| 4 |
2 |
Feb |
3 |
| 1 |
3 |
Feb |
23 |
| 2 |
3 |
Feb |
3 |
| 3 |
3 |
Feb |
10 |
| 4 |
3 |
Feb |
13 |
Looks like a lot of hard work for nothing doesn't it. But think now if you
want to add a new project. You just add a single project record to the project
table. In excel you would have to insert a new column for each month. Lets go back
to the problem of percentage of time spent on Project 2 by Design team in
Feb. You would create a new query and pull in all three tables. Linking the
StaffID and ProjectID. Set criteria to show only Feb stuff and only design team.
Then group by the data and sum it. You could get the query to ask for the group
and month as it opens so next time someone asks for those details you can pull
them up in a matter of seconds.
This is my first attempt at explaining normalization if you didn't understand
it then please drop me an email so we can discuss it and make more sense of it.
It's not the easiest thing to understand why it's done, but bear with it and a
couple on months of it and you'll wonder how you ever lived without it.
So now you know the concept. How do you actually do it for thousands of records???
Normalization - How To is a follow on from this document
and shows how to actual bulk transfer the data
|