How to Normalize your data - Procedure
So you've read all about what normalizing your data means and now have an action plan of
where the data is going and all your new tables.
But how on earth do I convert the excel spreadsheet with 20000 records in it. Thats a lot of new data entry
Don't panic, most flat file can be converted. It isn't easy but you can automate the largest part of it using some very clever queries.
Ok our first objective is too get the data into access. Use whatever method you like to import the original
data into access and call the table tbl_original
Next very importantly add an autonumber field to this table so each record has a unique ID we can refer back to.
Lets call this new field ID
Let's take our previous example and add the ID field
tbl_Original - ID field has been added
| ID |
Person
| Department
| Project 1 Jan
| Project 2 Jan
| Project 3 Jan
| Project 1 Feb
| Project 2 Feb
| Project 3 Feb
|
| 1 |
John |
Design |
64 |
3 |
16 |
7 |
14 |
23 |
| 2 |
Fred |
Sales |
4 |
2 |
5 |
12 |
0 |
3 |
| 3 |
Sarah |
Admin |
54 |
6 |
13 |
19 |
8 |
10 |
| 4 |
James |
Design |
23 |
7 |
13 |
8 |
3 |
13 |
Ok the easy one is tbl_Staff, which shows our staff. All you have to do is copy the table in access (data and structure)
and then delete the fields you don't need (Don't delete the ID field). Then rename the ID field to StaffID
tbl_Staff Here's what we end up with
| FirstName
| LastName
| StaffID
|
| John |
Hawkins |
1 |
| Fred |
Bloggs |
2 |
| Sarah |
Hawkins |
3 |
| James |
Khan |
4 |
Ok next we build the projects table and populate it. This step would be manual
for our example as none of the details were tracked in the original flat file.
But for you it may be the case that you can pull some details from the table
tbl_Projects Contains all the details about the projects
| ProjectName
| Company
| ProjectID
| CompleteDate
| Cost
| Comments
|
| WebSite |
Fabalou Web |
1 |
23/12/2000 |
$1000 |
In progress |
| FireFox2 |
BAA |
2 |
14/3/2000 |
$32,000 |
In bug test |
| WhiteSnake |
Epic |
3 |
5/6/2000 |
$21 |
On back burner |
Time for the big one. How do we get all that hours worked data into our HoursWorked table
without lots of manual data entry?
The answer lies with union queries.
Build tbl_HoursWorked first. It should have the following fields
| StaffID
| ProjectID
| Month
| Hours
|
|   |
  |
  |
  |
Lets imagine we are just doing one of our projects, and we are just handling 1 month. (Project 1 Jan)
We need to select the staffID (which is the same as our ID in tbl_Original) and the Hours and
state the Month and the project ID
SELECT tbl_Original.ID AS
StaffID,
tbl_Original.[Project 1 Jan] AS Hours,
1 AS ProjectID,
'Jan' AS [Month]
FROM tbl_Original; |
So we have to specify the projectID and month but it is picking up the Hours and the staffid. So this sql would output
| StaffID
| ProjectID
| Month
| Hours
|
| 1 |
1 |
Jan |
64 |
| 2 |
1 |
Jan |
4 |
| 3 |
1 |
Jan |
54 |
| 4 |
1 |
Jan |
23 |
Excellent we are part of the way to our final table.
So now lets make a big union query which does all 6 columns
SELECT tbl_Original.ID AS StaffID, tbl_Original.[Project 1 Jan] AS Hours, 1 AS ProjectID, 'Jan' AS [Month]
FROM tbl_Original
UNION SELECT ALL tbl_Original.ID AS StaffID, tbl_Original.[Project 2 Jan] AS Hours, 2 AS ProjectID, 'Jan' AS [Month]
FROM tbl_Original
UNION SELECT ALL tbl_Original.ID AS StaffID, tbl_Original.[Project 3 Jan] AS Hours, 3 AS ProjectID, 'Jan' AS [Month]
FROM tbl_Original
UNION SELECT ALL tbl_Original.ID AS StaffID, tbl_Original.[Project 1 Feb] AS Hours, 1 AS ProjectID, 'Feb' AS [Month]
FROM tbl_Original
UNION SELECT ALL tbl_Original.ID AS StaffID, tbl_Original.[Project 2 Feb] AS Hours, 2 AS ProjectID, 'Feb' AS [Month]
FROM tbl_Original
UNION SELECT ALL tbl_Original.ID AS StaffID, tbl_Original.[Project 3 Feb] AS Hours, 3 AS ProjectID, 'Feb' AS [Month]
FROM tbl_Original;
|
So each time the field to pull from changes and the projectID and month is reflected.
Run the query and heres what we get
| StaffID
| ProjectID
| Month
| Hours
|
| 1 |
1 |
Jan |
64 |
| 2 |
1 |
Jan |
4 |
| 3 |
1 |
Jan |
54 |
| 4 |
1 |
Jan |
23 |
| 1 |
2 |
Jan |
3 |
| 2 |
2 |
Jan |
2 |
| 3 |
2 |
Jan |
6 |
| 4 |
2 |
Jan |
7 |
| 1 |
3 |
Jan |
16 |
| 2 |
3 |
Jan |
5 |
| 3 |
3 |
Jan |
13 |
| 4 |
3 |
Jan |
13 |
| 1 |
1 |
Feb |
7 |
| 2 |
1 |
Feb |
12 |
| 3 |
1 |
Feb |
19 |
| 4 |
1 |
Feb |
8 |
| 1 |
2 |
Feb |
14 |
| 2 |
2 |
Feb |
0 |
| 3 |
2 |
Feb |
8 |
| 4 |
2 |
Feb |
3 |
| 1 |
3 |
Feb |
23 |
| 2 |
3 |
Feb |
3 |
| 3 |
3 |
Feb |
10 |
| 4 |
3 |
Feb |
13 |
Fantastic. Just copy the output and paste it into tbl_HoursWorked
|