In my previous post I mentioned several ways of defining regular expressions in Groovy. Here I want to show how we can use Groovy regex to find the data in the files.

## Parsing properties file (simplified)1

Data: each line in the file has the same structure; the entire line can be matched by single regex.

Task: transform each line to the object.

Solution: construct regex with capturing parentheses, apply it to each line, extract captured data.

Demonstrates: File.eachLine method, matrix syntax of Matcher object.

## Parsing CSV files (simplified)2

Data: each line in the file has the same structure; the line consists of the blocks separated by some character sequence.

Task: transform each line to the list of objects.

Solution: construct regex with capturing parentheses, parse each line with the regex in a loop extracting captured data.

Demonstrates: ~// Pattern defenition, Matcher.group method, \G regex meta-sequence.

## Finding snapshot dependencies in the POM (simplified)3

Data: file contains blocks with known boundaries (possibly spanning multiple lines).

Task: extract the blocks satisfying some criteria.

Solution: read the entire file into the string, construct regex with capturing parentheses, apply the regex to the string in a loop.

Demonstrates: File.text property, list syntaxt of Matcher object, named capture, global \x regex modifier, local \s regex modifier.

## Finding stacktraces in the log

Data: file contains entries each of which starts with the same pattern and can span multiple lines. Typical example is log4j log files:

2009-10-16 15:32:12,157 DEBUG [com.ndpar.web.RequestProcessor] Loading user
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:839)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:835)
at org.springframework.orm.hibernate3.HibernateTemplate\$1.doInHibernate(HibernateTemplate.java:531)
at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:419)
at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:374)
at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:525)
at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:519)
at com.ndpar.dao.UserManager.getUser(UserManager.java:90)
... 62 more
2009-10-16 15:32:14,659 DEBUG [com.ndpar.jms.MessageListener] Received message:
... multi-line message ...
2009-10-16 15:32:15,169 INFO  [com.ndpar.dao.UserManager] User: ...


Task: find entries satisfying some criteria.

Solution: read the entire file into the string4, construct regex with capturing parentheses and lookahead, split the string into entries, loop through the result and apply criteria to each entry.

Demonstrates: regex interpolation, combined global regex modifiers \s and \m.

### Footnotes

1. This example is for demonstration purposes only. In real program you would just use Properties.load method.
2. The regex is simplified. If you want the real one, take a look at Jeffrey Friedl’s example.
3. Again, in reality you would find snapshots using mvn dependency:resolve | grep SNAPSHOT command.
4. This approach won’t work for big files. Take a look at this script for practical solution.