I want to extract state abbreviation (2 letters) and zip code (either 4 or 5 numbers) from the following string
address <- "19800 Eagle River Road, Eagle River AK 99577
907-481-1670
230 Colonial Promenade Pkwy, Alabaster AL 35007
205-620-0360
360 Connecticut Avenue, Norwalk CT 06854
860-409-0404
2080 S Lincoln, Jerome ID 83338
208-324-4333
20175 Civic Center Dr, Augusta ME 4330
207-623-8223
830 Harvest Ln, Williston VT 5495
802-878-5233
"
For the zip code, I tried few methods that I found on here but it didn't work mainly because of the 5 number street address or zip codes that have only 4 numbers
text <- readLines(textConnection(address))
library(stringi)
zip <- stri_extract_last_regex(text, "\\d{5}")
zip
library(qdapRegex)
rm_zip3 <- rm_(pattern="(?<!\\d)\\d{5}(?!\\d)", extract = TRUE)
zip <- rm_zip3(text)
zip
[1] "99577" "1670" "35007" "0360" "06854" "0404" "83338" "4333" "4330" "8223" "5495" "5233" NA
For the state abbreviation, I have no idea how to extract
Any help is appreciated! Thanks in advance!
Edit 1: Include phone numbers
states <- str_extract(text, "\\b[A-Z]+(?=\\s+\\d{5}$)")
– Wiktor Stribiżew May 4 '17 at 17:32"AK" "AL" "CT" "ID" NA NA
– IloveCatRPython May 4 '17 at 17:35{5}
to{4,5}
like this:states <- str_extract(text, "\\b[A-Z]+(?=\\s+\\d{5}$)")
– degant May 4 '17 at 17:39