|| Feedback | Index to site | Removing postcodes from addresses (Access) ||
If you import a text file of data into an Excel worksheet you may wish to extract a particular range of characters from each record. For example if each data record (or row) looks something like 'A100 John 23.5 South' you may wish to extract the number (23.5) which is the third out of four items in the string. As always, there are several ways of doing this.
The most obvious method might be to parse the data. The ribbon option [Data] Data Tools | Text to Columns starts a wizard which will guide you through the process of splitting up a range of data into columns. The wizard prompts you to describe the character which separates each piece of data within the string - in the example above that would be a space. The data is then rewritten on your worksheet, broken up over as many columns as required.
|33||A100 ABP 0017 F||ABP|
|34||B231 CIM 0041 M||CIM|
|35||C031 HCC 1032 F||HCC|
[B33] = Mid(A33, 6, 3)
You may however choose to retain the data in its original joined format. If the data you want to extract is always in exactly the same place in each row you can simple use the MID() function to obtain it. The function takes the format =MID(TextRef, StartPosition, NumberOfCharacters) where TextRef is a cell reference containing the text; StartPosition is the character number at which you wish to start extracting; NumberOfCharacters is the length of the element you wish to read from the text.
[B39] =MID(A39, FIND(";", A39, 7)+1,
FIND(";", A39, FIND(";", A39,
7)+1)-FIND(";", A39, 7)-1)
If the item is not always in exactly the same character position within each text string you can incorporate the FIND() function as parameters within the MID() function. This method works if you have a reasonable idea about where the required text is found but it quickly becomes a long and complicated formula.
The FIND() function takes the format =FIND(SearchItem, TextString, StartPos) where SearchItem is the character(s) you wish to find within the TextString, starting your search at the character in position StartPos. It returns a number representing the position at which the SearchItem is found.
Would you like to learn more?
Note that FIND() is case sensitive which means that searching for 'abc' will not recognise 'ABC'. To perform the same task without case sensitivity, use the similar SEARCH() function.
You may find however that the word or item you wish to extract is not in a regular position and not of a uniform length. Also the positioning may need to be counted from the end rather than the beginning (e.g. the penultimate word in a string containing a varied number of words). In these circumstances the built in Excel functions probably can not cope and some programming is required.
The final example applies a User Defined function (a visual basic macro) to extract a particular word or series of characters from a text string. It takes the form: =MYEXTRACT(TextString, WordNumber, FrontOrBack, Separator) where the TextString is a phrase of words or items; WordNumber is a count of words or items from the front or end of the phrase; FrontOrBack is the character 'F' or 'B' to signify which end of the phrase you are counting from; the optional Separator represents the character which separates each word or item. In this example it is a space but it could be defined as a comma or semi colon.
|45||Abby Monday 8 Surgery||8|
|46||Mrs Plumstead Friday 30.5 Pain||30.5|
|47||Dr Jones Wednesday 103 Trauma||103|
[B45] =MyExtract(A45, 2, "B", " ")
In example 3 the function takes the phrase in column A and extracts the second word from the end, where spaces separate each item. You may wish to apply the VALUE() function to convert the text result to a number.
The VBA macro is shown below. This must be copied into a visual basic module.
|Function MyExtract(MyText As String, ItemNo As Integer, FrontOrBack As String, Optional MySeparator As String) As String
' This function will scan a string containing at least TWO words, each separated
' by a specified character and extract one of the words. It can extract the word
' (or SubString) counting from either the Back or Front of the text.
Dim LenText As Integer, n As Integer, CountSpaces As Integer
Dim MySt As Integer, MyFin As Integer, MyStep As Integer, Mk1 As Integer, Mk2 As Integer
' MySeparator was an optional parameter
If Len(MySeparator) = 0 Then MySeparator = " "
LenText = Len(MyText)
' You cannot extract a word if length is LT 3 chars
If LenText < 3 Then
MyExtract = "*"
' set the direction in which the text is examined
If UCase(FrontOrBack) = "F" Then
MySt = 2
MyFin = LenText - 1
MyStep = 1
MyFin = 2
MySt = LenText - 1
MyStep = -1
' identify the position of characters matching the separator
For n = MySt To MyFin Step MyStep
If Mid(MyText, n, 1) = MySeparator Then
CountSpaces = CountSpaces + 1
If CountSpaces = ItemNo - 1 Then Mk1 = n
If CountSpaces = ItemNo Then Mk2 = n
If CountSpaces = 0 Then
MyExtract = "*"
If UCase(FrontOrBack) = "B" Then
n = Mk1
Mk1 = Mk2
Mk2 = n
If Mk2 = 0 Then Mk2 = LenText + 1
Mk1 = Mk1 + 1
MyExtract = Mid(MyText, Mk1, Mk2 - Mk1)
There is also a visual basic function for splitting a long string of text into several shorter lengths with appropriate breaks at the ends of words. Link to MakeStr()
|Get this information as a document
accompanied by Excel worksheets
|Click here for details about
file. It has been rewritten for Excel 2010.
|file: XLExtractText.htm||© meadinkent.co.uk 2014||Page last updated Jan14||CMIDX S3 P6 Y|