... to overengineer or not to overengineer ...
There
is a simple task: crawl html pages, parse those and return parsed
results. Sounds simple and it is. In fact it is just about 40 lines of
code including comments
public async Task<ActionResult<MeasurementSet>>
Get([FromQuery]DateTime? from = null, [FromQuery]DateTime? to = null)
{
// Defaulting values
from ??= DateTime.Today.AddDays(-1);
to ??= DateTime.Today;
// Defining URLs
var query = $"beginn={from:dd.MM.yyyy}&ende={to:dd.MM.yyyy}";
var baseUrl = new Uri("https://BaseURL/");
using (var client = new HttpClient { BaseAddress = baseUrl })
{
// Collecting data
return Ok(new MeasurementSet
{
Temperature = await GetMeasures(query, client, "wassertemperatur"),
Level = await GetMeasures(query, client, "wasserstand"),
});
}
}
private static async Task<IEnumerable<Measurement>>
GetMeasures(string query, HttpClient client, string apiName)
{
// Retrieving the data
var response = await client.GetAsync($"{apiName}/some/other/url/part/?{query}");
var html = await response.Content.ReadAsStringAsync();
// Parsing HTML response
var bodyMatch = Regex.Match(html, "<tbody>(.*)<\\/tbody>");
var rowsHtml = bodyMatch.Groups.Values.Last();
return Regex.Matches(rowsHtml.Value, "<tr class=\"row2?\"><td >([^<]*)<\\/td><td class=\"whocares\">([^<]*)<\\/td>")
// Building the results
.Select(match => new Measurement
{
Date = DateTime.Parse(match.Groups[1].Value),
Value = decimal.Parse(match.Groups[2].Value)
});
}
It is not the best code, as it would be complicated to maintain, extend, debug and test. It is not fit to the basic OOP principals (except maybe DRY). So let's refactor it (sources and the history placed here)
Refactoring
Assuming the project as real life code let's try to fix and refactor it problem by problem
1. Possible failures
First step is error handling, as lots of things could go wrong. There are no logging at the moment, but changes would make errors at least more descriptive:
if (response.IsSuccessStatusCode)
{
throw new Exception($"{apiName} gathering failed with. [{response.StatusCode}] {html}");
}
// Parsing HTML response
var bodyMatch = Regex.Match(html, "<tbody>(.*)<\\/tbody>");
if (!bodyMatch.Success)
{
throw new Exception($"Failed to define data table body. Content: {html}");
}
In fact, it was more or less functional improvement.
2. Layers separation
All the logic is located in a single class, which is an API controller as well, means it acts as persistence, business and presentation at the same time. Next step would be separation of the layers by introducing
MeasureParser in business layer and RawMeasuresCollector in persistence layer
The URL building is part of persistence logic, that's why the enum MeasureType with appropriate logic in persistence layer was introduced.
var apiName = measure switch
{
MeasureType.Temperature => "wassertemperatur",
MeasureType.Level => "wasserstand",
_ => throw new NotImplementedException($"Measure type {measure} not implemented")
};
3. Single Responsibility
There are logic of URL building based on provided dates or default values, that should be separated as well. It is easy to perform by introducing UrlQueryBuilder, which would take care about the exact single task:
public class UrlQueryBuilder
{
public DateTime From { get; set; } = DateTime.Today.AddDays(-1);
public DateTime To { get; set; } = DateTime.Today;
public string Build(MeasureType measure)
{
var query = $"beginn={From:dd.MM.yyyy}&ende={To:dd.MM.yyyy}";
var apiName = measure switch
{
MeasureType.Temperature => "wassertemperatur",
MeasureType.Level => "wasserstand",
_ => throw new NotImplementedException($"Measure type {measure} not implemented")
};
return $"{apiName}/some/other/url/part/?{query}";
}
}
4. Reducing coupling
Coupling between business logic and persistence layers gets reduced with introduction of interface. As a nice feature here would be start of usage Dependency injection. On top we can remove hardcoded base URL from code and move it into the settings:
var settings = Configuration.GetSection("AppSettings").Get<AppSettings>();
services.AddHttpClient(Constants.ClientName, client =>
{
client.BaseAddress = new Uri(settings.BaseUrl);
});
services.AddTransient<IRawMeasuresCollector, RawMeasuresCollector>();
5. Testing
After all the steps above, we can finally mock up the data collector interface and test business logic, which is represented with HTML parsing and URL query building features. More code less words, check parsing test sample:
[TestMethod]
public async Task TestHtmlTemperatureParsing()
{
var collector = new Mock<IRawMeasuresCollector>();
collector
.Setup(c => c.CollectRawMeasurement(MeasureType.Temperature))
.Returns(Task.FromResult(_temperatureDataA));
var actual = (await new MeasureParser(collector.Object)
.GetMeasures(MeasureType.Temperature)
).ToArray();
Assert.AreEqual(165, actual.Length);
Assert.AreEqual(7.1M, actual
.First(l => l.Date == DateTime.Parse("24.11.2020 10:15")).Value);
}
6. Encrease flexibility
Unfortunatelly, current the source is not trusted, URLs and the page DOM can be changed any time. Means the structure should be flexible enough to handle.
To resolve issue with possible source change, the factory for measures collector was created, potentially another implementation of IRawMeasuresCollector could read the HTML from another source.
There are two different measure types, which a read from different pages. But at the same time there is only single the parsing logic algorythm. The logic separation would save the functionality if at some point DOM of one of the pages would be changed. And controller GET function would be:
public async Task<ActionResult<MeasurementSet>> Get
([FromQuery] DateTime? from = null, [FromQuery] DateTime? to = null)
{
var parser = new MeasureParser(_collectorFactory.CreateCollector(from, to));
return Ok(new MeasurementSet
{
Temperature = await parser.GetTemperature(),
Level = await parser.GetLevel(),
});
}
Additionaly that change causes removal of the MeasureType enum, which is positive as it reduces amount of code branches (no switch less branches) and it saves code from runtime failures if the introduced types wasn't covered in query building, as it would be visible on the build stage. Summary
As a result of our actions the project grows from a 40 line single file to something bigger:
It is flexible and maintainable (according to Visual studio it has 87 maintainablility index), safe and fits to the standards. But at the same time the functionality gets spread into different files and the project seems bigger than it should be, it is only 67 executable lines of code in 277 lines of source code.
As for me, such a structure could make more sence in a bigger projects, and goal is to find an individual balance for each project.
Feel free to mention the step, which you would see as a last step of the refactoring or bring more improvements ideas if you see the necessity.
Comments
Post a Comment