User Tools

Site Tools


pergamonmystic:linkedhelp:bison_text_importer

This is an old revision of the document!


Mystic Help

BISON: Text Importer

The BISON Text Importer is designed to pre-process both formatted text files as well as some proprietary formats which can be interpreted and processed as text files (e.g. the result of an Excel spreadsheet import). To ease understanding, the dialog is designed to mimic the way in which common importers used in third-party spreadsheets have often been displayed.

Initial Format Processing

Different text files, especially if produced on different operating systems (which use different end-of-line characters) result in the underlying text being loaded into the importer in different ways. The first thing to do therefore, is to tweak import settings until the list of results at the bottom of the dialog show a logical data set rather than 'garbage'.

The three main values for the initial understanding of the text are:

  • Record Separator - The record separator1) is the (non-visible) character that is used to separate lines in a text file. In a word-processed document, this is effectively the character(s) that is inserted into the file when you press the [Enter] key on the keyboard. Files produced on different operating systems or by different programs however, may have different line separators. Change the line separator until it is clear that a set of clearly defined rows are being presented in the list box.
  • Field Separator - The field separator is the character which divides different fields within a single row. Simply by definition, for a CSV2) you will need to choose a comma; for a TSV3) you will need to choose a TAB character.
  • Field Protector - Because it is possible to put characters into a field which confuse the file format (e.g. an actual comma in a field would result in an artificial field-break), fields are often 'protected' by wrapping them in a known affix. Probably the most common used in both CSV and TSV files is the double-quote.

For an Excel spreadsheet import, the Importer already knows how to read the standard internal structure of an Excel file. For this reason, the Record Delimiter, Field Delimiter and Protection character are locked and the data imported without the need to change these values.

Three additional settings may also modify how the data is displayed on the screen:

  • Internal Delimiter - Most fields that need to be imported are self-contained, single values. On occasion however, an import file may contain a range of data within a single field. An example of this is an Item export from other Library Management systems, which may place a series of Authors into a single field in the file. The Internal Delimiter allows you to define which character should be used within a field to further sub-divide the data into separate values.
  • Display Sample - Because a file may be of considerable size, by default the Importer will only display the first 50 rows as a sample which can be used to ensure that the settings (above) are producing the correct results. Once the import looks correct, try changing the Display Sample to [All Data] to check that all records can be brought into the Importer without corruption4).
  • Use 1st Row as Headings - Many CSV and TSV files, as well as many Excel files, use the first row to contain headings explaining what the fields mean in the file. By default, this check-box is on. If data seems to appear in the Header row of the results list, turn this check-box off.


Mystic Linked Help Files
Pergamon Wiki Home

1)
Note, that the example in the image was produced on an Apple Mac - if your file was produced on a Windows machine, it is more likely to need Carriage-Return/Newline as the record separator rather than Newline by itself
2)
Comma-Separated Values
3)
Tab-Separated Values
4)
on occasion, files are exported with foreign characters etc. which can prevent data after that point from reading correctly
pergamonmystic/linkedhelp/bison_text_importer.1633544025.txt.gz · Last modified: 2021/10/06 18:13 by admin