McAuley Bibliography - Parsing Instructions
This page can be found linked, under Instructions from here: http://www.lib.sfu.ca/mcauley_bibliographyProcedure
- 1. Download a bad_citation_?.txt file from the previous page
- 2. Open the bad_citation_?.txt file in your word processor (or text file editor)
- 3. Parse the file according to the instructions below
- 4. Save the file as a "TEXT" file
- 5. Email the completed bad_citation_?.txt file to calvinm@sfu.ca
- 6. Repeat for the next bad_citation_?.txt file
Headings
1 | Author |
2 | Title |
3 | Source |
4 | Volume & Issue |
5 | Date |
6 | Page(s) |
7 | Annotation |
How to parse a bad_citation file
Each file that needs to be parsed is separated into records. The beginning of a record is identified by this line: # BEGINNING OF RECORD [23068] line [36899] The end of a record is identified by this line: # END OF RECORD [23605] What follows the beginning of a record are a listing of errors reported by the automated parser. These lines begin with three astrisks ie. ***. It gives some idea of why the automated parser rejected this record.In order to parse the record, it must be split into the fields indicated by the headings above. The easiest way to to this is to move the cursor to where the field ends and hit enter. Next, label the next field with the corresponding number. Note that not all fields will have data for each record. If a field has no data, denote this by entering a hyphen '-' as it's entry.
Original citation looks like this:
Fetherling, Doug. 'Media: Dear diary: the journal of a journalist'. Q&Q 48 no 10 (Oct 82): 18A Parsed citation looks like this: 1 [tab] Fetherling, Doug. 2 [tab] 'Media: Dear diary: the journal of a journalist'. 3 [tab] Q&Q 4 [tab] 48 no 10 5 [tab] (Oct 82): 6 [tab] 18 7 [tab] -
Example 1
Original citation looks like this: # BEGINNING OF RECORD [23068] line [36899] # # *** Has one or more semi-colon(s) *** Too many colons - : # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = LIBRARY & LIBRARY SCIENCE _sub2 = _sub3 = _sub4 = # # 'Part IV: Directory of Canadian Library Periodicals'. Can Library 19 no 4 Part II (Jan 63): 304-05; 21 no 4 (Jan 65): 291-92 # # # END OF RECORD [23068]
This citation is really two citations with the same headings, title and source. So what happens here is that you will still need one complete set of the fields from 1-7 for the first citation. For subsequent citations, all you need to do is label the fields that are unique. In this case the second citation has a different volume/issue, date, and pages. So the only fields that need to be shown are 4, 5 and 6.
Parsed citation looks like this:
# BEGINNING OF RECORD [23068] line [36899] # # *** Has one or more semi-colon(s) *** Too many colons - : # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = LIBRARY & LIBRARY SCIENCE _sub2 = _sub3 = _sub4 = # # 1 [tab] - 2 [tab] 'Part IV: Directory of Canadian Library Periodicals'. 3 [tab] Can Library 4 [tab] 19 no 4 Part II 5 [tab] (Jan 63): 6 [tab] 304-05; 7 [tab] - 4 [tab] 21 no 4 5 [tab] (Jan 65): 6 [tab] 291-92 # # # END OF RECORD [23068]Example 2
Original citation looks like this: # BEGINNING OF RECORD [23605] line [37575] # # *** Source may be incorrect # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = PUBLISHER/LIBRARIAN RELATIONS _sub2 = _sub3 = _sub4 = # # Carver, Richard. 'National Library Services for Periodical Publishers'. CPPA Source no 17. CPPA newsl 82 (Sep 83): 1-4 # # # END OF RECORD [23605]Again, this citation is really two citations. By following the instructions indicated in example 1, the resulting citations are below.
Parsed citation looks like this:
# BEGINNING OF RECORD [23605] line [37575] # # *** Source may be incorrect # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = PUBLISHER/LIBRARIAN RELATIONS _sub2 = _sub3 = _sub4 = # # 1 [tab] Carver, Richard. 2 [tab] 'National Library Services for Periodical Publishers'. 3 [tab] CPPA Source 4 [tab] no 17. 5 [tab] - 6 [tab] - 7 [tab] - 3 [tab] CPPA newsl 4 [tab] 82 5 [tab] (Sep 83): 6 [tab] 1-4 # # # END OF RECORD [23605]Example 3
Original citation looks like this: # BEGINNING OF RECORD [23638] line [37630] # # *** Too many colons - : # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = ROMANCE _sub2 = _sub3 = _sub4 = # # Allan, Joan. 'Maclean's Review: The Press: Innocent heroines and leering villains'. Maclean's 75 no 16 (11 Aug 62): 51 [The Family Journal ] # # # END OF RECORD [23638]This citation is straight forward to parse.
Parsed citation looks like this:
# BEGINNING OF RECORD [23638] line [37630] # # *** Too many colons - : # # _heading = PERIODICALS & PERIODICAL PUBLISHING _sub1 = ROMANCE _sub2 = _sub3 = _sub4 = # # 1 [tab] Allan, Joan. 2 [tab] 'Maclean's Review: The Press: Innocent heroines and leering villains'. 3 [tab] Maclean's 4 [tab] 75 no 16 5 [tab] (11 Aug 62): 6 [tab] 51 7 [tab] [The Family Journal ] # # # END OF RECORD [23638]Notes to remember
- you don't have to worry about extra punctuation or spaces in the fields, these can automatically be taken out later
- if the citation parses into more than one citation, separate them with a blank line
- each bad_citation file should have at most 200 records
- save your work often in case anything goes wrong