Parsing files using Groovy regex
In my previous post I mentioned several ways of defining regular expressions in Groovy. Here I want to show how we can use Groovy regex to find the data in the files.
Parsing properties file (simplified)1
Data: each line in the file has the same structure; the entire line can be matched by single regex.
Task: transform each line to the object.
Solution: construct regex with capturing parentheses, apply it to each line, extract captured data.
Demonstrates: File.eachLine
method, matrix syntax of Matcher object.
Parsing CSV files (simplified)2
Data: each line in the file has the same structure; the line consists of the blocks separated by some character sequence.
Task: transform each line to the list of objects.
Solution: construct regex with capturing parentheses, parse each line with the regex in a loop extracting captured data.
Demonstrates: ~//
Pattern defenition, Matcher.group
method, \G
regex meta-sequence.
Finding snapshot dependencies in the POM (simplified)3
Data: file contains blocks with known boundaries (possibly spanning multiple lines).
Task: extract the blocks satisfying some criteria.
Solution: read the entire file into the string, construct regex with capturing parentheses, apply the regex to the string in a loop.
Demonstrates: File.text
property, list syntaxt of Matcher object, named capture, global \x
regex modifier, local \s
regex modifier.
Finding stacktraces in the log
Data: file contains entries each of which starts with the same pattern and can span multiple lines. Typical example is log4j log files:
2009-10-16 15:32:12,157 DEBUG [com.ndpar.web.RequestProcessor] Loading user
2009-10-16 15:32:13,258 ERROR [com.ndpar.web.UserController] id to load is required for loading
java.lang.IllegalArgumentException: id to load is required for loading
at org.hibernate.event.LoadEvent.(LoadEvent.java:74)
at org.hibernate.event.LoadEvent.(LoadEvent.java:56)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:839)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:835)
at org.springframework.orm.hibernate3.HibernateTemplate$1.doInHibernate(HibernateTemplate.java:531)
at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:419)
at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:374)
at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:525)
at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:519)
at com.ndpar.dao.UserManager.getUser(UserManager.java:90)
... 62 more
2009-10-16 15:32:14,659 DEBUG [com.ndpar.jms.MessageListener] Received message:
... multi-line message ...
2009-10-16 15:32:15,169 INFO [com.ndpar.dao.UserManager] User: ...
Task: find entries satisfying some criteria.
Solution: read the entire file into the string4, construct regex with capturing parentheses and lookahead, split the string into entries, loop through the result and apply criteria to each entry.
Demonstrates: regex interpolation, combined global regex modifiers \s
and \m
.
Resources
- Groovy regexes
- Groovy one-liners
- Using String.replaceAll method
Footnotes
- This example is for demonstration purposes only. In real program you would just use
Properties.load
method. - The regex is simplified. If you want the real one, take a look at Jeffrey Friedl’s example.
- Again, in reality you would find snapshots using
mvn dependency:resolve | grep SNAPSHOT
command. - This approach won’t work for big files. Take a look at this script for practical solution.