spirrobe – spirigs

June 19, 2024

Linked data (Python/LINDAS/SPARQL/hydrodata)

TL;DR;

Make a map of the original and/or fixed coordinates of the hydrological stations of the BAFU using the data from Lindas

The above map is an illustration of the data we can extract with use of a SPARQL query and lindas of the hydrological data published on it. As you can see, there are two sets of points, one in Switzerland (corrected version) and one in the Mediterranean based on the “raw” lindas data (400km offset for some reason).

In my last post, I wanted to see how to get the current data of waterbodies from lindas.admin.ch in one table including geometry information. Now, we of course want to use Python to investigate the data; additionally, we want to extend our existing hydrodata SPARQL query to handle also a subset of stations instead of all stations.

import datetime
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# The prefixes, mainly to have them seperate 
PREFIXES = """
PREFIX schema: <http://schema.org/>
PREFIX hg: <http://rdf.histograph.io/>
PREFIX la: <https://linked.art/ns/terms/>
PREFIX ex: <http://example.org/>

# hydro related prefixes
PREFIX h: <https://environment.ld.admin.ch/foen/hydro/>
PREFIX hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
PREFIX hgs: <https://environment.ld.admin.ch/foen/hydro/station/>
PREFIX river: <https://environment.ld.admin.ch/foen/hydro/river/observation/>
PREFIX lake: <https://environment.ld.admin.ch/foen/hydro/lake/observation/>

# geo related prefixes
PREFIX wkt: <http://www.opengis.net/ont/geosparql#asWKT>

# hydrodata related prefixes (fake) that we use to reference to the historical data. haha.
PREFIX hydrodata: <https://www.hydrodaten.admin.ch/plots/>

"""


## a function that makes the response to a dataframe, tailored to the hydrodata
# and that also contains a small validation step to remove "outdated" values
def _response2dataframe(response,
                        validate=True,
                        time_threshold=datetime.timedelta(minutes=30)):
    # return pd.read_csv(io.StringIO(resp.text))
    data = response.text.strip().split('\n')
    columns = data[0].split(',')
    data = [i.strip().split(',') for i in data[1:]]
    index = [i[0] for i in data]
    ix = columns.index('data')
    values = [[i[0]] + i[ix].split(';') for i in data]
    data = [[j for _, j in enumerate(i) if _ != ix]
            for i in data]
    # the dataframe holding all non measurement values
    columns.pop(ix)
    data1 = pd.DataFrame(columns=columns,
                        data=data,
                        index=index)

    for col in data1.columns:
        data1[col] = pd.to_numeric(data1[col], errors='ignore')

    def _list2dict(datalist):
        return {i: j for i, j in zip(datalist[::2], datalist[1::2])}
    values = {i[0]: _list2dict(i[1:])
              for i in values}
    ix = values.keys()
    columns = []

    for i in values.items():

        vartypes = list(i[1].keys())

        for vartype in vartypes:
            if vartype in columns:
                pass
            else:
                columns.append(vartype)
    data = [[] for i in columns]

    for i in ix:
        for colno, colname in enumerate(columns):
            value = values[i][colname] if colname in values[i].keys() else ''
            data[colno].append(value)
    data = np.transpose(data)

    data2 = pd.DataFrame(columns=columns, data=data, index=ix)
    for col in data2.columns:
        if col != 'measurementTime':
            data2[col] = pd.to_numeric(data2[col], errors='coerce')

    data = pd.concat([data1, data2], axis=1)
    data['query_time'] = pd.to_datetime(data['query_time'])
    data['measurementTime'] = pd.to_datetime(data['measurementTime'])

    if validate:
        time_threshold = pd.to_timedelta(time_threshold)
        good = data['query_time'] <= data['measurementTime'] + time_threshold
        # for illustration
        bad = data['query_time'] >= data['measurementTime'] + time_threshold
        data['measurementAge'] = data['query_time'] - data['measurementTime']
        data = data[good]
    return data



def get_hydrodata(station_ids=None,
                  return_as_dataframe=True,
                  validate=True,
                  time_threshold=datetime.timedelta(minutes=30)):
    query = [
        """# what we want to get out of the query as "nicely" formatted data
       SELECT ?station_id ?station_name ?station_type ?station_type_nice
              ?latitude ?longitude ?station_iri ?station_data_iri ?station_geometry_iri
              (group_concat(?measurement_type_measurement_value;SEPARATOR=";") as ?data)
              ?query_time
              # add the https://hydrodaten.admin.ch url for the data that go back 7 days (actually 40 days sometimes, see dev console)
             (if(contains(?data,"discharge"), uri(concat(str(hydrodata:), 'p_q_7days/', ?station_id, '_p_q_7days_de.json')), "")
              as ?hydrodata_url_discharge_waterlevel)
             (if(contains(?data,"temp"), uri(concat(str(hydrodata:), 'temperature_7days/', ?station_id, '_temperature_7days_de.json')), "")
              as ?hydrodata_url_temperature)
       """
        # where part including initial setup of measurement types
        """
          WHERE { ?station ?prop ?value.
                 filter( ?prop in (ex:isLiter, hd:measurementTime, hd:waterTemperature,  hd:waterLevel, hd:discharge))
        """,
        # generating data and variables part
        """
        bind(replace(replace(
             str(?station), str(river:), "", "i"),
             str(lake:), "", "i")
            as ?station_id)
        # the above is equivalent to this in theory but the above uses prefixes so would be easier to change
        # in case of changes
        # bind(strafter(str(?station), "observation/") as ?station_id2)
        # get the kind of station this is (whether lake or river) - the hierarchy/ontology here is weird as all are measurement stations
        # or observation but the distinction is river/lake first then observation
        bind(strbefore(strafter(str(?station), str(h:)), "/") as ?station_type)
        # uppercase the first letter -> why is there no capitalize function in SPARQL :throw_table:
        bind(concat(ucase(substr(?station_type,1,1)), substr(?station_type,2,strlen(?station_type)-1)) as ?station_type_nice)
        # remove some filler words from the data
        bind(replace(str(?prop), str(hd:), "", "i") as ?measurement_type)
        # convert the actual measurement value to a literal
        bind(str(?value) as ?measurement_value)
        # introduce the time of the query to (maybe) see if the values are old
        bind( str(now()) as ?query_time)
        # combine the type of measurement with the value so we can group by and not loose information
        bind(concat(?measurement_type, ';', ?measurement_value, '') as ?measurement_type_measurement_value)

        # start making IRIs so we can check/link back to lindas
        # the IRI containing information of the station itself (not linked to river/lake)
        bind(IRI(concat(str(hgs:), ?station_id)) as ?station_iri)
        # generate the geometry IRI
        bind(IRI(concat(str(hgs:), concat(?station_id,"/geometry"))) as ?station_geometry_iri)
        # generate the station IRI that holds the data information
        bind(IRI(concat(str(h:), ?station_type, "/observation/", ?station_id)) as ?station_data_iri)
        # get the information about the location of the station
        ?station_geometry_iri wkt: ?coordinates.
        # also get the name
        ?station_iri schema:name ?stat_name.
        bind(replace(?stat_name, ',', ';') as ?station_name)

        # simplifiy the geometry to lon/lat in array form
        BIND(STRBEFORE(STRAFTER(STR(?coordinates), " "), ")") AS ?latitude)
        BIND(STRBEFORE(STRAFTER(STR(?coordinates), "("), " ") AS ?longitude)
    }""",

    # grouping part
    """
    group by # the station information
    ?station_id ?station_type ?station_type_nice ?station_name
    ?latitude ?longitude
    # the IRIs
    ?station_iri ?station_data_iri ?station_geometry_iri
    # validation information
    ?query_time
    """,
    # sorting part
    """
    order by ?station_id
    """

    ]
    if station_ids is not None:
        # per default, get all stations but if some stations are given uses
        # this modication instead
        station_ids = station_ids if isinstance(
            station_ids, list) else list(station_ids)

        station_ids = '"' + '","'.join([str(i) for i in station_ids]) + '"'
        query.insert(1, f'filter (?station_id in ({station_ids}))')

    query = PREFIXES + '\n'.join(query)
    resp = requests.post("https://ld.admin.ch/query",
                         data="query=" + query,
                         headers={
                             "Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
                             "Accept": "text/csv"
                         }
                         )

    resp.encoding = "utf-8"

    if return_as_dataframe:
        resp = _response2dataframe(resp,
                                   validate=validate,
                                   time_threshold=time_threshold)

    return resp, query
    
if __name__ == '__main__':
    df, query = get_hydrodata()
    subset = [2091, 2613]
    df_subset, query_subset = get_hydrodata(station_ids=subset,
                                            return_as_dataframe=True)

A first approach is given by the lindas help which showcases the use of the typical request together with pandas. The example is straightforward but not usually how we’ll have data (more than 3 columns ..) that are useful with pandas. As we’ve seen in the last exploration of lindas, we can write a query that gives us a row for each station of the hydrological network of the BAFU. To be consistent with the example, we want a CSV return, which means replacing all commas with something better (especially the names of stations sometimes do have a comma inside). Other than that, the query that we used before also works directly via Python and we only extended it with the option to additionally filter with station_ids by adding the below (which adds a very simple filter into the query):

    if station_ids is not None:
        # per default, get all stations but if some stations are given uses
        # this modication instead
        station_ids = station_ids if isinstance(
            station_ids, list) else list(station_ids)

        station_ids = '"' + '","'.join([str(i) for i in station_ids]) + '"'
        query.insert(1, f'filter (?station_id in ({station_ids}))')

Additionally, as I didn’t manage to split the values again into separate columns after the grouping (or only with a lot of needless wordiness), I wrote a small converter function that mainly extracts the general information (meta-information if you will like latitude/longitude or similar) and converts the actual measurement data into columns and concatenates the two.

## a function that makes the response to a dataframe, tailored to the hydrodata
# and that also contains a small validation step to remove "outdated" values
def _response2dataframe(response,
                        validate=True,
                        time_threshold=datetime.timedelta(minutes=30)):
    # return pd.read_csv(io.StringIO(resp.text))
    data = response.text.strip().split('\n')
    columns = data[0].split(',')
    data = [i.strip().split(',') for i in data[1:]]
    index = [i[0] for i in data]
    ix = columns.index('data')
    values = [[i[0]] + i[ix].split(';') for i in data]
    data = [[j for _, j in enumerate(i) if _ != ix]
            for i in data]
    # the dataframe holding all non measurement values
    columns.pop(ix)
    data1 = pd.DataFrame(columns=columns,
                        data=data,
                        index=index)

    for col in data1.columns:
        data1[col] = pd.to_numeric(data1[col], errors='ignore')

    def _list2dict(datalist):
        return {i: j for i, j in zip(datalist[::2], datalist[1::2])}
    values = {i[0]: _list2dict(i[1:])
              for i in values}
    ix = values.keys()
    columns = []

    for i in values.items():

        vartypes = list(i[1].keys())

        for vartype in vartypes:
            if vartype in columns:
                pass
            else:
                columns.append(vartype)
    data = [[] for i in columns]

    for i in ix:
        for colno, colname in enumerate(columns):
            value = values[i][colname] if colname in values[i].keys() else ''
            data[colno].append(value)
    data = np.transpose(data)

    data2 = pd.DataFrame(columns=columns, data=data, index=ix)
    for col in data2.columns:
        if col != 'measurementTime':
            data2[col] = pd.to_numeric(data2[col], errors='coerce')

    data = pd.concat([data1, data2], axis=1)
    data['query_time'] = pd.to_datetime(data['query_time'])
    data['measurementTime'] = pd.to_datetime(data['measurementTime'])

    if validate:
        time_threshold = pd.to_timedelta(time_threshold)
        good = data['query_time'] <= data['measurementTime'] + time_threshold
        # for illustration
        bad = data['query_time'] >= data['measurementTime'] + time_threshold
        data['measurementAge'] = data['query_time'] - data['measurementTime']
        data = data[good]
    return data

Additionally, we can fix the offset that we found (~400 km in latitude) in the geometry by using the following (after importing my conversion functions):

rw, hw = wgs84_to_ch1903(df['latitude'], df['longitude'])
hw = hw + 400000
lat, lon = ch1903_to_wgs84(hw, rw)

Now we can see the difference, especially when we plot the dataframe of water temperatures (the labelling is really overkill at this point and just for illustration):

## fix the offset

hw, rw = wgs84_to_ch1903(df['latitude'], df['longitude'], plus=True)
hw = hw + 400000
lat, lon = ch1903_to_wgs84(hw, rw)


# setup the map
import cartopy.crs as ccrs
import cartopy.feature as cf
fig, ax = plt.subplots(dpi=200, figsize=[9, 9], subplot_kw={"projection":ccrs.PlateCarree()})
thevar = 'waterTemperature'
units = {'waterLevel': 'm MSL', 'waterTemperature': '°C', 'discharge': 'm³/s'}

# show the "original" coordinates
s = ax.scatter(df['longitude'], df['latitude'], s=150, edgecolor='k', c=df[thevar], vmin=0, vmax=25, cmap='coolwarm')
# show the fixed coordinates
s = ax.scatter(lon, lat, s=150, edgecolor='k', c=df[thevar], cmap='coolwarm', vmin=0, vmax=25)

# the respective name of the station as label if need be
add_station_label = False
if add_station_label:
    for sid, thevalue, lon, lat, station_name in zip(df['station_id'], df[thevar], df['longitude'], df['latitude'], df['station_name']):
        if np.isnan(thevalue):
            continue
        ax.text(lon, lat, station_name.replace(';','\n').replace(' ','\n'), va='center', ha='center', fontsize=3, clip_on=True)

# map decorations
ax.set_xlabel('Longitude (°)')
ax.set_ylabel('Latitude (°)')
plt.colorbar(s, aspect=50, pad=0.1, location='bottom', label=f'{thevar.replace("water","").capitalize()}'+f' ({units[thevar]})')
ax.set_title(f'Hydrodata in Switzerland')
ax.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False, lw=0.25, color='k')
# if you want, add a stockimage but it won't look nice at this resolution
#ax.stock_img()

# add coastline and borders to illustrate
ax.coastlines()
ax.add_feature(cf.BORDERS)
# set proper boundaries for illustration
ax.set_ylim(42, 48)
ax.set_xlim(5, 12)

If you wish instead to only plot data points that have a valid temperature you can use the condition notnull() on the variable:

# fix the offset
hw, rw = wgs84_to_ch1903(df['latitude'], df['longitude'], plus=True)
hw = hw + 400000
lat, lon = ch1903_to_wgs84(hw, rw)

# setup the map

fig, ax = plt.subplots(dpi=200, figsize=[16, 12], subplot_kw={"projection":ccrs.PlateCarree()})
thevar = 'waterTemperature'
units = {'waterLevel': 'm MSL', 'waterTemperature': '°C', 'discharge': 'm³/s'}

# filter out stations without values for this variable
cond = df[thevar].notnull()
# show the "original" coordinates
s = ax.scatter(df['longitude'][cond], df['latitude'][cond], s=150, edgecolor='k', c=df[thevar][cond], vmin=0, vmax=25, cmap='coolwarm')
# show the fixed coordinates
s = ax.scatter(lon[cond], lat[cond], s=150, edgecolor='k', c=df[thevar][cond], cmap='coolwarm', vmin=0, vmax=25)

# map decorations
ax.set_xlabel('Longitude (°)')
ax.set_ylabel('Latitude (°)')
plt.colorbar(s, aspect=50, pad=0.1, location='bottom', label=f'{thevar.replace("water","").capitalize()}'+f' ({units[thevar]})')
ax.set_title(f'Hydrodata of Switzerland')
ax.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False, lw=0.25, color='k')

# if you want, add a stockimage but it won't look nice at this resolution
#ax.stock_img()

# add coastline and borders to illustrate
ax.coastlines()
ax.add_feature(cf.BORDERS)

# set proper boundaries for illustration
ax.set_ylim(45.5, 48)
ax.set_xlim(5.5, 11)

Going forward, you could use requests to download the past measurement data that are linked in the dataframe via the query (hydrodata URL) and make an animation; probably xarray would be a better choice for holding the data then though…

Hope this was interesting on how to combine and going forward we’ll look at using openlayers and the JSON response from lindas to make a map.

PS: On another note, while we found an offset last time, there is one station in the dataset that has a wrong longitude as well, the station with id 2270 “Combe des Sarrasins” with latitude of 41.315… and longitude of 0.296… according to its lindas entry but should really be at 47.195… , 6.877…

June 14, 2024June 20, 2024

Linked data (LINDAS/SPARQL/hydrodata)

TL;DR;

Get the up-to-date data from LINDAS of the currently available values discharge / water level/water temperatures for all BAFU stations.

PREFIX schema: <http://schema.org/>
PREFIX hg: <http://rdf.histograph.io/>
PREFIX la: <https://linked.art/ns/terms/>
prefix ex: <http://example.org/>

# hydro related prefixes
prefix h: <https://environment.ld.admin.ch/foen/hydro/>
prefix hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
prefix hgs: <https://environment.ld.admin.ch/foen/hydro/station/>
prefix river: <https://environment.ld.admin.ch/foen/hydro/river/observation/>
prefix lake: <https://environment.ld.admin.ch/foen/hydro/lake/observation/>

# geo related prefixes
prefix wkt: <http://www.opengis.net/ont/geosparql#asWKT>

# hydrodata related prefixes (fake) that we use to reference to the historical data. 
prefix hydrodata: <https://www.hydrodaten.admin.ch/plots/>





# what we want to get out of the query as "nicely" formatted data
SELECT ?station_id ?station_name ?station_type ?station_type_nice
    ?latitude ?longitude 
    ?station_iri ?station_data_iri ?station_geometry_iri
	  (group_concat(?measurement_type_measurement_value;SEPARATOR=";") as ?data)
		#?hydrodata_url_temperature ?hydrodata_url_discharge_waterlevel
		?query_time 
		# add the https://hydrodaten.admin.ch url for the data that go back 
		# 7 days (actually 40 days sometimes, see dev console)
  		(if(contains(?data,"discharge"), 
  		    uri(concat(str(hydrodata:), 'p_q_7days/',
  		               ?station_id, '_p_q_7days_de.json')), 
          "") 
    		as ?hydrodata_url_discharge_waterlevel)
        (if(contains(?data,"temp"), 
            uri(concat(str(hydrodata:), 'temperature_7days/', 
                ?station_id, '_temperature_7days_de.json')), 
           "") 
    	as ?hydrodata_url_temperature)
 
where {
  		# it seems counterintuitive but getting everything and filtering 
  		# is faster than first getting 
  		# the stationlist and go trough their IRI
        ?station ?prop ?value.
  		# we only want to keep actual data variables and time
        filter( ?prop in (ex:isLiter, hd:measurementTime, hd:waterTemperature,  hd:waterLevel, hd:discharge))
  		# convert the subject station to the id, which is the last thing after the /;
  		bind(replace(replace(
        	str(?station), str(river:), "", "i"),
      		str(lake:), "", "i")
    		as ?station_id)
  		# the above is equivalent to this in theory but the above 
  		# uses prefixes so would be easier to change
		# in case of changes
  		#bind(strafter(str(?station), "observation/") as ?station_id2)
  		# get the kind of station this is (whether lake or river) 
  		# - the hierarchy/ontology here is weird as all are measurement stations
		# or observation but the distinction is river/lake first then observation
  		bind(strbefore(strafter(str(?station), str(h:)), "/") as ?station_type)
  		# uppercase the first letter -> why is there no capitalize function in SPARQL
  		bind(concat(ucase(substr(?station_type,1,1)), substr(?station_type,2,strlen(?station_type)-1)) as ?station_type_nice)
  		# remove some filler words from the data
		bind(replace(str(?prop), str(hd:), "", "i") as ?measurement_type)
  		# convert the actual measurement value to a literal
        bind(str(?value) as ?measurement_value)
  		# introduce the time of the query to (maybe) see if the values are old
        bind( str(now()) as ?query_time)
  		# combine the type of measurement with the value so we 
  		# can group by and not loose information
  		bind(concat(?measurement_type, ';', ?measurement_value, '') as ?measurement_type_measurement_value) 
    	
  		# start making IRIs so we can check/link back to lindas
  		# the IRI containing information of the station itself (not linked to river/lake)
  		bind(IRI(concat(str(hgs:), ?station_id)) as ?station_iri)
  		# generate the geometry IRI
  		bind(IRI(concat(str(hgs:), concat(?station_id,"/geometry"))) as ?station_geometry_iri)
  		# generate the station IRI that holds the data information
  		bind(IRI(concat(str(h:), ?station_type, "/observation/", ?station_id)) as ?station_data_iri)
  		# get the information about the location of the station
  		?station_geometry_iri wkt: ?coordinates.
  		# also get the name and remove commas to not confuse them with seps
      ?station_iri schema:name ?stat_name.
      bind(replace(?stat_name, ',', ';') as ?station_name)

  		# simplifiy the geometry to lon/lat in array form
        BIND(STRBEFORE(STRAFTER(STR(?coordinates), " "), ")") AS ?latitude)
        BIND(STRBEFORE(STRAFTER(STR(?coordinates), "("), " ") AS ?longitude)



}
# group by, basically keep everything of the select here
# except the ?measurement_type_measurement_value which we want to combine to one row
# (ideally we would actually get them as seperate columns again but this seems the best we can do for the moment
group by # the station information
		?station_id ?station_type ?station_type_nice ?station_name 
		?latitude ?longitude
		# the IRIs
		 ?station_iri ?station_data_iri ?station_geometry_iri
		# validation information
		 ?query_time

This one took quite some time and serves as of now not much purpose other than training SPARQL for me. Essentially, I was asked once to use “the API” for the current water temperatures that are embedded in my RaspberryPI webcams.

Now, initially I actually “webscapped” the JSON data from hydrodata and transferred them to the MCR database as this allowed the concurrent use of in-house data with the BAFU data. Additionally, the urban/rural/water temperature graph of dolueg2 uses exactly the same temperature, meaning it makes sense to keep it in the same database (at least for some of the relevant stations around Basel, the B2091 means BAFU station 2091).

Now going back to “the API” and RDF where data are in triplet form (subject property object) to allow reuse and linking we can get the most recent data of any of the BAFU stations, via e.g.

prefix hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
SELECT * WHERE {
  <https://environment.ld.admin.ch/foen/hydro/river/observation/2091> hd:waterTemperature ?obj .
}
# see also the page: https://environment.ld.admin.ch/foen/hydro/river/observation/2091

or some relevant values for us (more than water temperature)

prefix hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
prefix obs: <https://environment.ld.admin.ch/foen/hydro/river/observation/>


SELECT * WHERE {
  obs:2091 ?prop ?obj .
  filter( ?prop in (hd:measurementTime, hd:waterTemperature,  hd:waterLevel, hd:discharge))

}

The above is an excerpt of the output from the LINDAS SPARQL endpoint at the time of writing. Notably, we can also get the information for the station (note the added prefix):

prefix station: <https://environment.ld.admin.ch/foen/hydro/station/>
SELECT * WHERE {
  station:2091 ?prop ?obj .
}

And as you can see, there is also a geometry linked to the station data (shown without query this time):

Cool – except for the fact that the southernmost point of Switzerland is located at 45°49’N. And the station in question (id 2091) is in Rheinfelden, i.e. in the North of Switzerland. So something seems off here and I informed the responsible person in the hopes that they’ll fix that soon.

Now, initially, I thought I’d first extract the IDs of the relevant station, then get their properties of interest/values but somehow the query runs rather slow (6+ seconds); probably this could be fixed and if someone wants to try, the following should give a nicely formatted data table (I actually assume that eventually the trigger for measurement time might not be a good choice when more and more datasets would be available on LINDAS):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# hydro related prefixes
prefix h: <https://environment.ld.admin.ch/foen/hydro/>
prefix hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
prefix hydro_geo: <https://environment.ld.admin.ch/foen/hydro/station/>
# geo related prefixes
prefix wkt: <http://www.opengis.net/ont/geosparql#asWKT>

select ?station_iri ?station_id ?station_type ?coordinates ?station_coordinates ?query_time ?measurement_time ?waterTemperature ?discharge ?waterLevel 
where {
  
    ?station hd:measurementTime ?measurement_time.
    optional {?station hd:waterLevel ?waterLevel. }
    optional { ?station hd:discharge ?discharge.}
  	optional { ?station hd:waterTemperature ?waterTemperature.}
  # measurement related things
  bind(now() as ?query_time)
  bind(strbefore(strafter(str(?station), str(h:)), "/") as ?station_type)
  bind(strafter(str(?station), "observation/") as ?station_id)
  # geo related things
  # first find all the relevant subjects, i.e. station property pages
  bind(IRI(concat(str(hydro_geo:), concat(?station_id,"/geometry"))) as ?station_geometry_iri)
  bind(IRI(concat(str(hydro_geo:), ?station_id)) as ?station_iri)
  ?station_geometry_iri wkt: ?coordinates.
  BIND(strafter(strbefore(str(?coordinates),")"), "POINT(") AS ?station_coordinates)
  # can be used for yasgui and nicer representation but slows the query
  #bind(xsd:decimal(?station_id) as ?station_id_numeric)
} 
#order by desc(?station_id_numeric)

As you can see, it’s important to use the OPTIONAL keyword as not all stations have all variables (water temperature / discharge / level). Additionally, I retrieve the kind of station (lake/river) that is available. The choice of prefix for these is a bit weird for my feeling as the hierarchy seems to be river/lake then station instead of the other way around – which must have a reason that I’m too inexperienced to understand. The rest of the query is cosmetics to get a bit nicer values (keep the geometry as proper type and you can get a map of the observations though).

As seen, this runs rather slow (6+ seconds for around 230 records is too slow on any day) so I started experimenting, arriving ultimately at the code at the top (here shortened to remove geometry and hydrodata links) by first filtering out the relevant dimensions, essentially getting the data first, then cleaning up a bit and concatenating by station id.

PREFIX schema: <http://schema.org/>
PREFIX hg: <http://rdf.histograph.io/>
PREFIX la: <https://linked.art/ns/terms/>
prefix ex: <http://example.org/>

# hydro related prefixes
prefix h: <https://environment.ld.admin.ch/foen/hydro/>
prefix hd: <https://environment.ld.admin.ch/foen/hydro/dimension/>
prefix hgs: <https://environment.ld.admin.ch/foen/hydro/station/>
prefix river: <https://environment.ld.admin.ch/foen/hydro/river/observation/>
prefix lake: <https://environment.ld.admin.ch/foen/hydro/lake/observation/>

# geo related prefixes
prefix wkt: <http://www.opengis.net/ont/geosparql#asWKT>


# what we want to get out of the query as "nicely" formatted data
SELECT ?station_id ?station_name ?station_type ?station_type_nice ?station_iri ?station_data_iri 
	  (group_concat(?measurement_type_measurement_value;SEPARATOR=", ") as ?data)
		#?hydrodata_url_temperature ?hydrodata_url_discharge_waterlevel
		?query_time  
where {
  		# it seems counterintuitive but getting everything and filtering is faster than first getting 
  		# the stationlist and go trough their IRI
        ?station ?prop ?value.
  		# we only want to keep actual data variables and time
        filter( ?prop in (hd:measurementTime, hd:waterTemperature,  hd:waterLevel, hd:discharge))
  		# convert the subject station to the id, which is the last thing after the /;
  		bind(replace(replace(
        	str(?station), str(river:), "", "i"),
      		str(lake:), "", "i")
    		as ?station_id)
  		# the above is equivalent to this in theory but the above uses prefixes so would be easier to change
		# in case of changes
  		#bind(strafter(str(?station), "observation/") as ?station_id2)
  		# get the kind of station this is (whether lake or river) - the hierarchy/ontology here is weird as all are measurement stations
		# or observation but the distinction is river/lake first then observation
  		bind(strbefore(strafter(str(?station), str(h:)), "/") as ?station_type)
  		# uppercase the first letter -> why is there no capitalize function in SPARQL (╯°□°)╯︵ ┻━┻
  		bind(concat(ucase(substr(?station_type,1,1)), substr(?station_type,2,strlen(?station_type)-1)) as ?station_type_nice)
  		# remove some filler words from the data
		bind(replace(replace(replace(
            str(?prop), str(hd:), "", "i"),
            "water", "", "i"),
            "measurementTime", "measurement_time", "i") 
            as ?measurement_type)
  		# convert the actual measurement value to a literal
        bind(str(?value) as ?measurement_value)
  		# introduce the time of the query to (maybe) see if the values are old
        bind( str(now()) as ?query_time)
  		# combine the type of measurement with the value so we can group by and not loose information
  		bind(concat(?measurement_type, ': ', ?measurement_value, '') as ?measurement_type_measurement_value) 
    	
  		# start making IRIs so we can check/link back to lindas
  		# the IRI containing information of the station itself (not linked to river/lake)
  		bind(IRI(concat(str(hgs:), ?station_id)) as ?station_iri)
  	
  

}
# group by, basically keep everything of the select here
# except the ?measurement_type_measurement_value which we want to combine to one row
# (ideally we would actually get them as seperate columns again but this seems the best we can do for the moment
group by # the station information
		?station_id ?station_type ?station_type_nice ?station_name 
		# the IRIs
		 ?station_iri ?station_data_iri
		# validation information
		 ?query_time

This query is the reduced version of the one on top. It runs in around 1 second or less and can now be used for automated queries or other products as described here on LINDAS. The only additions are the added geometry and the URLs for the hydrodata website that give you the historical data to implement a product I have in mind that might come up later. And maybe at some point, we can also get data from the past on LINDAS, making the project I have in mind easier.

Update:
The offset in the geometry seems to be consistently 400’000 m in “hochwert”/latitude based on three stations picked from hydrodata and LINDAS where longitude is almost correct:

b = 2634026 , 1125901 # 2531
a = 7.854351126515431, 42.67878915566025 # 2531
a = wgs84_to_ch1903(*a[::-1], plus=True)[::-1]
print([i - j for i, j in zip(a, b)])
#out: [132.7675012690015, -401020.22037598956]

b = 2627189 , 1267845 # 2091
a = 7.777036377264884, 43.96335989872657 # 2091
a = wgs84_to_ch1903(*a[::-1], plus=True)[::-1]
print([i - j for i, j in zip(a, b)])
#out: [1.471713425591588, -399994.8737920185]

b = 2830800 , 1168706 # 2617
a = 10.265672193049967, 43.036021653211186 # 2617
a = wgs84_to_ch1903(*a[::-1], plus=True)[::-1]
print([i - j for i, j in zip(a, b)])
#out: [48.71478863572702, -399969.5114403835]

May 23, 2024

backing off – extending the wait for server response via python

TL;DR; Add a decorator to avoid server timeouts

When getting swisstopo altitude profiles the response time is in general quite good but just sometimes I ran into timeouts, be it due to my bad connection or too many requests to the server. To amend I added this decorator that you can get as the most up-to-date version from my GitHub gists to the original swisstopo function (or other functions that might run into some form of timeout)

def exponential_retry(max_retries, delay):
    """Retry a request with exponentially increasing (slightly random) time in between tries"""
    """Usage before def of another function via @exponential_retry(5, 1) """
    def decorator_exponential_retry(func):
        import functools
        @functools.wraps(func)
        def wrapper_exponential_retry_decorator(*args, **kwargs):
            import random
            retries = 0
            while retries <= max_retries:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Attempt {retries + 1} failed: {e}")
                    retries += 1
                    sleeptime = (delay * 2 ** retries + random.uniform(0, 1))
                    print(f"Retrying in {sleeptime:.2f} seconds...")
                    time.sleep(sleeptime)
            raise Exception("Maximum amount of retries reached, too many timeouts/errors.")
        return wrapper_exponential_retry_decorator
    return decorator_exponential_retry
    
 
@exponential_retry(5, 1)
def get_swisstopo_elevation_profile(...):
    .# source code of get_swisstopo_elevation_profile
    ...

May 22, 2024

No network after disconnecting VPN (cisco/global protect) on Linux (with systemd)

sudo systemctl restart systemd-resolved.service

VPN clients are often used to get access to internal infrastructure. Often I had to use them to enter time tracking, connect to servers or similar. However, on my usual device running Linux, after I was done I’d disconnect after I was done – however, internet connectivity wasn’t working properly afterwards (despite being connected). This was usually an issue of resolving addresses and not per see a connection issue. The above one line restarts the systemd resolver, meaning I’d get back the connection when running. Hope this helps somebody that faces the same issue.

May 22, 2024

overlay an image onto another one

TL;DR; Use the below code to insert one image onto another at a fixed position (scaled down).

The code below draws an image onto another existing picture. The use case is straight forward: Insert a logo or similar at a fixed position or similar to enhance your figures. As always, check out the related gist on my GitHub which contains the most up-to-date version.


def overlayimage(overlaypicture,
                 baseimg,
                 outfile='same',
                 # position in relative coordinates for icon/other image
                 position=[0.1, 0.1],
                 # scaling of image
                 width=0.2,
                 # scaling of image
                 height=0.2,
                 # if the height/width do not fit the aspect
                 # of the image, adjust it automatically
                 keepaspect=True,
                 # the adjustment is by default for width, change to True
                 # to scale width instead
                 adjustheight=False,
                 # use the top corner (usual for images)
                 # set to False, as mainly we think from lower left as 0,0
                 # in science
                 fromtop=False,
                 # if outfile is not changed, input is required
                 # set force to change the original without checking
                 force=False,
                 ):

    import PIL

    if outfile == 'same':
        if not force:
            print('Are you sure you wan\'t to overlay it on the same picture?',
                  'Set force to suppress this warning')
            from time import sleep

            try:
                print("Press Ctrl+C within 10 seconds and then Y/N")
                for i in range(0, 10):  # waits 10 seconds
                    sleep(1)
            except KeyboardInterrupt:
                ans = input()
                if (ans.lower()[0] in ['y', 'j'] and 'n' not in ans.lower()):
                    force = True
                else:
                    print('There was an unrecognized character!',
                          'Are you sure you know how this works?',
                          'Let\'s try again...')
                    return overlayimage(overlaypicture,
                                        baseimg,
                                        position=position,
                                        width=width,
                                        height=height,
                                        keepaspect=keepaspect,
                                        adjustheight=adjustheight,
                                        fromtop=fromtop,
                                        force=force)

        if force:
            outfile = baseimg
        else:
            outfile = outfile.split('.')
            outfile[0] += '_overlay'
            outfile = '.'.join(outfile)

    if width > 1:
        width /= 100

    if height > 1:
        height /= 100

    position = [p / 100 if p > 1 else p for p in position]
    background = PIL.Image.open(baseimg)
    foreground = PIL.Image.open(overlaypicture)
    if keepaspect:
        if adjustheight:
            height *= foreground.height / foreground.width
        else:
            width *= foreground.width / foreground.height
        foreground = foreground.resize((int(width*background.width),
                                        int(height*background.height)),
                                       PIL.Image.LANCZOS)
    else:
        foreground = foreground.resize((int(width*background.width),
                                        int(height*background.height)),
                                       PIL.Image.LANCZOS)

    # 4-tuple defining the left, upper, right, and lower pixel corners
    # since we usually define this from lower left but images are from upper
    # left the proper position is 1-height-ypos
    pos = [int(position[0]*background.width),
           int((1-position[1]-height)*background.height)]

    background.paste(foreground, pos, foreground)
    background.save(outfile)
    return outfile


if __name__ == '__main__':
    print('*'*10, 'HELP for overlayimage', '*'*10)
    print('Put another picture into one that already exists')
    print('Outputfile is by default the same and position is 20%',
          'and starts from lower left corner but can be adjusted as needed')

The idea is to enhance your graph with a logo or icon that portrays better for what is shown, e.g. insert an animal or similar instead of just text for you tickmarks. While this could be done manually, that feels too tedious if you have to do it many times. As this utility edits saved images, it uses PIL to handle the opening/closing/adding the image.

The usage is relatively straightforward, pass in the two filenames, first the one that is the image to be overlain and second the existing image. If you are certain the placement (whatever you enter for the position, default 10% from top and 10% from left as this is in image coordinates) is fine, you can overwrite the file in place (outfile=’same’). I recommend trying out first what you want to add where. Ideally, this utility is either combined with images that all have the same layout or you get the position directly from a script where you create your images.

Generally, the aspect of the image to be overlain is kept the same and the width is scaled accordingly but this can also be changed via keyword. Hope this helps someone out there.

May 14, 2024

My implementation of a touchdown event via javascript

TL;DR; The below code can be added as a script to your page to implement a touchdown event detection and directly attach it to a “menu” element

async function swipedetect(el, callback, threshold=50, minimumTime=150, allowedTime=2000){

    var touchsurface = el,
    swipedir,
    startX,
    startY,
    distX,
    distY,
    threshold = threshold, //required min distance traveled to be considered swipe in X
//    restraintX = restraintX, // maximum distance allowed at the same time in perpendicular direction in X
//    thresholdY = thresholdY, //required min distance traveled to be considered swipe in Y
//    restraintY = restraintY, // maximum distance allowed at the same time in perpendicular direction in Y
    minimumTime = minimumTime, // maximum time allowed to travel that distance
    allowedTime = allowedTime, // maximum time allowed to travel that distance
    elapsedTime,
    startTime,
    handleswipe = callback || function(swipedir){}

    touchsurface.addEventListener('touchstart', function(e){
        var touchobj = e.changedTouches[0]
        swipedir = 'none'
        dist = 0
        startX = touchobj.pageX
        startY = touchobj.pageY
        startTime = new Date().getTime() // record time when finger first makes contact with surface
        // e.preventDefault()
    }, false)

    touchsurface.addEventListener('touchmove', function(e){
        e.preventDefault() // prevent scrolling when inside DIV
    }, false)

    touchsurface.addEventListener('touchend', function(e){
        var touchobj = e.changedTouches[0]
        distX = touchobj.pageX - startX; // get horizontal dist traveled by finger while in contact with surface
        distY = touchobj.pageY - startY; // get vertical dist traveled by finger while in contact with surface
        dist = distY * distY + distX * distX;
        dist = Math.sqrt(dist);
        elapsedTime = new Date().getTime() - startTime; // get time elapsed
        // calculate the angle of the movement
        angle = (Math.atan2(distY, distX) / Math.PI * 180)
        // convert angle to a sector of the movement
        // offsetting it by 45 degreees to get the movement easier.
        // and ensuring its positive
        sector = (angle + 45 + 360) % 360;
        sector = Math.floor(sector/90);
        if (elapsedTime >= minimumTime && elapsedTime <= allowedTime && dist >= threshold){ // first condition for a swipe met
            switch (sector) {
               case 0: swipedir = "right"; break;
               case 1: swipedir = "up"; break;
               case 2: swipedir = "left"; break;
               case 3: swipedir = "down"; break;
            }
        }
        handleswipe(swipedir)
        if (dist >= threshold) {
          e.preventDefault()
        }
    }, false)
}

async function swipe_enable(elementid='menu', table={'right':'a', 'up':'w', 'left':'d', 'down':'s', 'none': null}) {
  var el = document.getElementById(elementid);
  if (el) {
    swipedetect(el, async function(swipedir){
      //swipedir contains either "none", "left", "right", "top", or "down"
      let key = table[swipedir];
      document.dispatchEvent(new KeyboardEvent('keypress', {'key': key}));
    })
  }
}

Handling a touchdown event via vanilla JS is nothing new, and I also took the initial implementation from a website I cannot find anymore. The structure was similar to this gist and the related discussion, as in initially I simply added a touchdown but the detection of right/left/up/down was patchy until I also added the direction sector. Additionally, I did not want to attach it all the time to each element etc. but instead added another function that handles the attaching.

Now I can simply have the above in a js-lib file and source it on our data visualisation page, then call swipe_enable() in the doc.ready function and be certain that swiping is handled on the corresponding element (with selects in this case). This makes usage of the page far easier on mobile devices; highly relevant if you are in the field like in CLOUDLAB and your phone is the only thing available due to the windy/rainy/snowy weather.

May 14, 2024July 2, 2024

Get the most recent census data from the Basler Atlas

TL;DR;

Get the census data (and more) from the Basler Atlas that are updated by the Statistisches Amt Basel. The up-to-date version can be found on my GitHub Gists. Don’t forget to add the data source, i.e. the Stat. Amt Basel when you use their data. – it is being added automatically to the “meta” part in the datacontainer (in addition to the URL, the name and short name of the dataset).

import datetime


headers = {'User-Agent': 'Mozilla/5.0'}


def data2csv(data,
             outpath='./',
             prefix='',
             suffix='',
             filesep='_',
             datasep=',',
             ext='.csv',
             quiet=True,
             ):

    import os
    if isinstance(data, dict) and 'data' not in data.keys():
        of = []
        for key in data.keys():
            _of = data2csv(data[key],
                           outpath=outpath,
                           prefix=prefix,
                           filesep=filesep,
                           datasep=datasep,
                           ext=ext,
                           quiet=quiet,
                           )
            if isinstance(_of, list):
                of += _of
            else:
                of.append(_of)
        return of

    outpath = os.path.expanduser(outpath)
    filename = filesep.join([prefix,
                             data['meta']['theme'],
                             data['meta']['subtheme'],
                             _nivgeo2nicename(data['meta']['nivgeo']),
                             suffix,
                             ])
    filename = filename.lstrip(filesep).rstrip(filesep) + ext
    os.makedirs(outpath, exist_ok=True)
    if not quiet:
        print('Saving as', outpath+filename)
    with open(outpath+filename, 'w') as file_obj:
        file_obj.writelines(datasep.join(data['header']) + '\n')

        datalines = [datasep.join(map(lambda x: str(x), line))
                     for line in data['data']]
        file_obj.writelines('\n'.join(datalines) + '\n')
    return outpath+filename


def _get_topic_url():
    url = 'https://www.basleratlas.ch/GC_listIndics.php?lang=de'
    return url


def _get_data_url():
    # used to be the first url at around 2018
    # "https://www.basleratlas.ch/GC_ronds.php?lang=de&js=1&indic=bevstruk.alter0&nivgeo=wge&ts=1&serie=2015"
    url = 'https://www.basleratlas.ch/GC_ronds.php'
    url = 'https://www.basleratlas.ch/GC_indic.php'
    return url


def _get_references_url():
    # e.g. https://www.basleratlas.ch/GC_refdata.php?nivgeo=wbl&extent=extent1&lang=de
    url = 'https://www.basleratlas.ch/GC_refdata.php'
    return url


def get_ref_data(nivgeo, slimmed=True):
    import requests
    payload = {'nivgeo': nivgeo,
               'lang': 'de', 'extent': 'extent1'}
    refs = requests.get(_get_references_url(),
                        params=payload,
                        headers=headers)
    refdata = refs.json()['content']['territories']
    # contains the actual numbering in the main geographical administrative
    # dataset that it can be linked against
    if slimmed:
        return refdata['libgeo'], refdata['codgeo']
    else:
        return refdata


def _replace_error_value(value, error_value, replace_value):
    return replace_value if value <= error_value else value


def get_basler_atlas_data(nivgeo,
                          theme,
                          subtheme,
                          year,
                          quiet=True,
                          error_value=-9999,
                          replace_value=0,
                          empty_value='',
                          ):
    import requests
    from requests.exceptions import JSONDecodeError

    payload = _params2payload(nivgeo, theme, subtheme, year=year)

    response = requests.get(_get_data_url(),
                            params=payload,
                            headers=headers)
    if response.ok:

        try:
            data = response.json()['content']['distribution']
        except JSONDecodeError:
            print(f'issues with {response.url} and the payload {payload}')
            return None, None
        if not quiet:
            print(f'Got data from {response.url}, transforming ...')

        values = data['values']

        if 'sortIndices' in data:
            # reported values are to be sorted, usual case?
            indices = data['sortIndices']
        else:
            # reported values are sorted already, usual case for Bezirk?
            indices = range(len(values))
        if isinstance(values, dict):
            keys = list(values.keys())
            indices = range(len(values[keys[0]]))
            values = [[_replace_error_value(values[key][i],
                                            error_value,
                                            replace_value,
                                            )
                      for key in keys]
                      for i in sorted(indices)
                      ]
            data = values
            return keys, data
        else:
            data = [str(_replace_error_value(values[i],
                                             error_value,
                                             replace_value,
                                             )
                        )
                    for i in sorted(indices)]
            return None, data
    else:
        if not quiet:
            print(f'Request for {payload} failed')
        return None, None


def _nivgeo2map(nivgeo):
    lookuptable = {'block': 'map5',
                   'wbl': 'map5',
                   'bezirk': 'map6',
                   'wbe': 'map6',
                   'viertel': 'map2',
                   'wvi': 'map2',
                   'gemeinde': 'map3',
                   'wge': 'map3',
                   # 'wahlkreis': 'map7',
                   # 'pwk': 'map7',
                   }

    return lookuptable[nivgeo]


def _get_nivgeos():
    return 'wbe', 'wvi', 'wge', 'wbl'  # , 'pwk'


def _nicename2nivgeo(nivgeo=None):
    nicenames = _nivgeo2nicename(nivgeo=nivgeo)
    if nivgeo is None:
        return {v: k for k, v in nicenames.items()}
    else:
        return {nicenames: nivgeo}


def _nivgeo2nicename(nivgeo=None):
    names = {'wbe': 'bezirk',
             'wvi': 'viertel',
             'wge': 'gemeinde',
             'wbl': 'block',
             # 'pwk': 'wahlkreis',
             }

    if nivgeo is None:
        return names
    else:
        return names[nivgeo]


def _params2payload(nivgeo,
                    theme,
                    subtheme,
                    year=None):

    payload = {'lang': 'de',
               'dataset':  theme,
               'indic': subtheme,
               'view': _nivgeo2map(nivgeo),
               }

    if year is not None:
        payload['filters'] = 'jahr='+str(year)

    return payload


def get_basler_atlas(start_year=1998,
                     end_year=datetime.date.today().year,
                     # population has different ages,
                     # men m , women w and a grand total
                     themes={'bevstruk': ['alter0',
                                          'alter20',
                                          'alter65',
                                          'w',
                                          'm',
                                          'gesbev'],
                             'bevheim': ['anteil_ch',
                                         'anteil_bs',
                                         'anteil_al',
                                         'anteil_bsanch',
                                         # "gesbev" has been replaced by
                                         'gesbev_f',
                                         ],
                             'bau_lwg': ['anzahl_lwg',
                                         ],
                             },
                     geographical_levels='all',
                     error_value=-9999,
                     replace_value=0,
                     testing=False,
                     quiet=True,
                     ):
    _nicenames = _nicename2nivgeo()

    if geographical_levels == 'all':
        nivgeos = _get_nivgeos()
    else:
        if isinstance(geographical_levels, str):
            geographical_levels = [geographical_levels]
        _nivgeos = _get_nivgeos()

        nivgeos = [i if i in _nivgeos else _nicenames[i]
                   for i in geographical_levels]

        assert all([i in _nivgeos or i in _nicenames
                    for i in nivgeos])

    # the defaults that we know of - there is wahlbezirke too on the
    # atlas but we don't particularly care about that one...
    data = {}
    
    # mapping of information from topic url to meta information entries
    info2meta = {'url': 'c_url_indicateur',
                 'name': 'c_lib_indicateur',
                 'short_name': 'c_lib_indicateur_court',
                 'unit': 'c_unite',
                 'source': 'c_source',
                 'description': 'c_desc_indicateur',
    }
    for nivgeo in nivgeos:
        refname, refnumber = get_ref_data(nivgeo)
        refdata = [[_refname, _refnumber]
                   for _refname, _refnumber in zip(refname, refnumber)]
        # ids of the nivgeo is in refdata['codgeo']
        # names of the nivgeo is in refdata['libgeo']
        nicename = _nivgeo2nicename(nivgeo)

        for theme in themes:
            if not quiet:
                print(f'Working on {theme} for {_nivgeo2nicename(nivgeo)}')
            for subtheme in themes[theme]:
                if not quiet:
                    print(f'Working on {theme}.{subtheme} for ',
                          f'{_nivgeo2nicename(nivgeo)}')
                # force a copy of refdata otherwise we keep updating the old list of lists.
                container = {'data': [i.copy() for i in refdata],
                             'meta': {'theme': theme,
                                      'nivgeo': nivgeo,
                                      'subtheme': subtheme,
                                      'theme': theme, },
                             'header': ['referencename', 'referencenumber'],
                             }
                topicinfo = get_basler_atlas_topics(theme=theme,
                                                    subtheme=subtheme,
                                                    fullinfo=True)
                for key, value in info2meta.items():
                    container['meta'][key] = topicinfo[theme][subtheme][value]

                # values will be nested, adjust header line with extra
                for year in range(start_year, end_year+1):

                    if not quiet:
                        print(f'Getting data for {year} for {theme}',
                              f'{subtheme} for {_nivgeo2nicename(nivgeo)}')

                    keys, thisdata = get_basler_atlas_data(nivgeo,
                                                           theme,
                                                           subtheme,
                                                           year,
                                                           quiet=quiet,
                                                           )
                    if thisdata is None:
                        if not quiet:
                            print(f'Failure to get data for {year} for {theme}',
                                  f'{subtheme} for {_nivgeo2nicename(nivgeo)}')
                        thisdata = [''] * len(container['data'])

                    if keys is None:
                        container['header'] += [f'{year}']
                    else:
                        container['header'] += [f'{year}_{key}'
                                                for key in keys]

                    for i, value in enumerate(thisdata):
                        if not isinstance(value, list):
                            value = [value]
                        container['data'][i] += value

                    if testing:
                        break  # year

                data[nicename+'_'+theme+'_'+subtheme] = container

                if testing:
                    break  # specific theme
            if testing:
                break  # theme
        if testing:
            break  # nivgeo

    return data


def get_basler_atlas_topics(theme=None,
                            subtheme=None,
                            fullinfo=False):
    import requests
    if subtheme is not None and isinstance(subtheme, str):
        subtheme = [subtheme]
    payload = {"tree": "A01", 'lang': 'de'}
    if theme is not None:
        payload['theme'] = theme
    response = requests.get(_get_topic_url(),
                            params=payload,
                            headers=headers)
    topiclist = response.json()['content']['indics']

    data = {}
    for topicinfo in topiclist:
        maintopic, subtopic = (topicinfo['c_id_dataset'],
                               topicinfo['c_id_indicateur'])
        if maintopic not in data:
            if fullinfo:
                data[maintopic] = {}
            else:
                data[maintopic] = []

        if subtheme is not None and subtopic not in subtheme:
            continue
        if fullinfo:
            data[maintopic][subtopic] = topicinfo
        else:
            data[maintopic] += [subtopic]

    return data
  
if __name__ == '__main__':
  print("""
        Some example usages:
        # get some information for the population for the years 200 to 2003
        data = get_basler_atlas(end_year=2003, 
                                start_year=2000, 
                                themes={'bevheim': ['anteil_ch','anteil_bs', ]},
                                geographical_levels='wvi',
                                quiet=True)
        # just get everything that is a predefined topic: 
        themes={'bevstruk': ['alter0',
                             'alter20',
                             'alter65',
                             'w',
                             'm',
                             'gesbev'],
                'bevheim': ['anteil_ch',
                            'anteil_bs',
                            'anteil_al',
                            'anteil_bsanch',
                            # "gesbev" has been replaced by
                            'gesbev_f',
                            ],
                'bau_lwg': ['anzahl_lwg',
                            ],
        }
        # limit just by years
        data = get_basler_atlas(themes=themes, start_year=2000, end_year=2010)
        # also save the data to csv files (by theme and subtheme)
        data2csv(data)
        # get some information regarding the available topics
        themes = get_basler_atlas_topics()
        """)

In the years of 2014 to 2020 or so I was teaching “An introduction to Geoinformatics” at the University of Basel. As a starter, students were supposed to load data of the administrative units of Basel in the form of shapefiles into a GIS (ArcMap or QGIS, I gave support in both) and connect these with population data (age groups or similar). These data can be downloaded from the Statistisches Amt Basel, usually in the form of Excel sheets – which is doable for a GIS but not great to be used: Sometimes I even struggled with getting the right separator in the excel sheet to be automatically recognised for the later import (which is just one reason why I prefer QGIS). However, the updated numbers were sometimes published later than when I’d like to use them – even though the Basler Atlas already had them. So I went to work to scrape the data directly from this source, because I did not want to manually download the data each and every semester again and for several dataset individually. We can do better than that.

In the back, the Basler Atlas uses geoclip, a product from a French company (you also see their logo when opening the Atlas). Relevant for me was that with the browser inspect feature you can get the requests being made in the back and see where the data are coming from (you can of course also click on the request to see the direct answer)

As you can see there is a listIndics and a indic API endpoint. ListIndics gives an overview of datasets, indic gives the data of the “indices”. Both require some specific payload to work, the most important being {‘lang’: ‘de’}. There is however a third endpoint that gives the crucial reference information, namely the name/id of the administrative feature at GC_refdata.php. Initially, I only used the indic endpoint to collect the data directly and the ref to get the reference frame. Combining requests to both, the data can be collected automatically, either to update or for the first time and come back in dict, including a headerline and meta information for the later export (also included above). Now you can get your up-to-date data from the Basler Atlas as well and have a better time using it for temporal analysis by running the above code and then calling this or something like this

data = get_basler_atlas(geographical_levels='block',
                        themes={'bevstruk': ['alter0',
                                             'alter20',
                                             'alter65',
                                             'w',
                                             'm',
                                             'gesbev'], },
                        quiet=True)

This gives you the population data on the block level of the years from 1998 until today (2024 as of this writing). Calling the below saves the data to your home directory, namely 6 CSV files.

data2csv(data, outpath='~/')

Today I extended the code above to also get the available topics (i.e. the categories). Which allows us to discover topics or download data that would require some time to find otherwise.

geolevel = 'block'

for key, values in topics.items():
    if key != 'bevstruk':
        continue
    data = get_basler_atlas(themes={key: values},
                         quiet=True,
                         geographical_levels=geolevel)
    of = data2csv(data, outpath='~/')
    break

As always, I hope this helps someone to discover data or save some time for themselves. And maybe at some point, I’ll find the time to document everything a bit better. Don’t forget to add the data source, i.e. the Stat. Amt Basel when you use their data.

Update/Fix

As I noticed today, there was some issue with the previous version of this script, namely a “deepcopy” issue that led to the fact, that the datacontainer had data from all connected subthemes when more than one was requested. Ups. But it should be fine now, the gist on top is updated and some usage information is added as well. 🙂

May 10, 2024

GIF frame extraction made easy (sci. vis. util)

TL;DR;Summary

Extract the single frames of a GIF to repurpose them (for example to create a video) with the below code. As always, the most up-to-date version can be found in the GitHub gist

def gif2imgs(file,
             save=False,
             outpath='same',
             outfilename='same',
             ):
    """
    Extract image array from a GIF file.

    Based on PIL, frames of a GIF file are extracted to a numpy array. If
    more than one frame is present in the GIF, a list of numpy arrays is 
    returned. It is possible to save the images directy, by default in the 
    same path as the GIF and the same basename.
    Parameters
    ----------
    file : str
        The GIF file containing one or more image frames.
    save : bool, optional
        Whether to save the image frames directy. The default is False, i.e.
        the function returns a list of the image frames as seperate numpy arrays
    outpath : str, optional
        Where to save the extracted image frames.
        The default is 'same', i.e. same folder
    outfilename : str, optional
        How to name the extracted image as files.
        The default is 'same', i.e. taking the basename and adding _frameX
        to it. If a string is passed, the same suffix is added but the basename
        is taken as the passed in string.

    Returns
    -------
    list of frames or two lists (of filenames of extracted image and images)
        The extracted frames from the GIF and if saving was turned on also 
        the filenames of written out frames.

    """

    import PIL
    import numpy as np
    import os
    import matplotlib.image as img

    pilIm = PIL.Image.open(file)
    pilIm.seek(0)

    # Read all images inside
    images = []
    try:
        while True:
            # Get image as numpy array
            tmp = pilIm.convert()  # Make without palette
            a = np.asarray(tmp)
            if len(a.shape) == 0:
                continue
            # Store, and next
            images.append(a)
            pilIm.seek(pilIm.tell()+1)
    except EOFError:
        pass

    if save:
        if outpath == 'same':
            outpath = os.path.dirname(file) + os.sep
        if outfilename == 'same':
            outfilename = os.path.basename(file)

            outfiles = []
        for frameno, frame in enumerate(images):
            _ = outpath+outfilename.rstrip('.gif') + f'_frame{frameno}.png'
            img.imsave(_, frame, vmin=0, vmax=255)
            outfiles.append(_)

        return outfiles, images
    else:
        return images
      

if __name__ == '__main__':
    import requests
    import matplotlib.pyplot as plt
    fileurl = 'https://gifdb.com/images/high/jumping-cat-typing-on-keyboard-2b15r60jnh8hn5sv.gif'
    response = requests.get(fileurl)
    filename = './example.gif'
    with open(filename, 'wb') as fo:
      fo.write(response.content)
    frames = gif2imgs(filename)
    plt.imshow(frames[0])

Recently I wanted to make a video for a talk. And that is where the trouble started.

Initially, I thought to place a video and GIF side by side. However, getting a synchronous playback was tricky because there are few to no control options for GIFs in most presentation software. As a result, I had to extract the frames from the GIF instead and had to convert them into a video. While I could do this with any of many online services I specifically needed a synchronised playback as the timing of the content needed to match. The code above does exactly these frame extractions and pairs up nicely with another utility, the img2vid (which converts image files to a video with a choosable frame rate.

Simply set the option to save the frames and then run the img2vid on the single images to make the video and you can use it with controls in your presentation software, be it PowerPoint, or anything else.

The above main part gets you an example GIF (it had to be a cat, right?) and shows you the first frame. Enjoy!

April 12, 2024April 12, 2024

Parsivel 2 sampling via linux

TL;DR;Summary

Sample the OTT Parsivel² on a Linux device via Python and output data as ASDO-CSV and/or netCDF4 by using this code:

#!/bin/python3
import os
import time

import datetime
import serial
import subprocess

import numpy as np
import netCDF4 as nc

class parsivel_via_serial(serial.Serial):
    def __init__(self,
                 # serial port parameters
                 # moxa is created as virtual USB0
                 # to setup most likely the RS-485 2W has to be activated
                 # by running
                 # setserial /dev/ttyUSB0 port 1
                 # for this to work, and also this program, the user than runs
                 # this file the user needs to be in the dialout group (for fedora),
                 #  which can be done via the command:
                 # sudo usermod -a -G dialout $USER
                 # BE AWARE, a logout/reboot may be required
                 port='/dev/ttyUSB0',
                 baudrate = 57600,
                 ncmeta={'Station_Name': 'Eriswil (Kt. Bern, Switzerland)',
                         'latitude': 47.07051,
                         'longitude': 7.87254,
                         'altitude': 921,
                         'Sensor_ID': 411994,
                         'Title': 'CLOUDLAB (cloudlab.ethz.ch) disdrometer data from OTT Parsivel-2',
                         'Institution': 'ETH Zurich',
                         'Contact': "Jan Henneberger, janhe@ethz.ch;\n \
                                     Robert Spirig, rspirig@ethz.ch;\n \
                                     Fabiola Ramelli, ramellif@ethz.ch",
                         "Author": 'Robert Spirig, rspirig@ethz.ch',
                 },
                 outpath='./',
                 stationname='Eriswil',
                 quiet=True,
                 ):
        #
        cmd = ['setserial','/dev/ttyUSB0', 'port', '1']
        res = subprocess.call(cmd)
        print(f'{cmd} call resulted in {res}')
        if res != 0:
            print('Setserial of the port to RS485-2W failed, ensure serialadapter is installed')
            return

        # inherit init from serial and open the port
        super().__init__(port=port, baudrate=baudrate)
        self.quiet = quiet
        # a bytebuffer to hold the answer from the parsivel
        self.buffer = b''
        # how to decode the bytes to a sensible string, generally UTF8 is prefered
        self.codec = 'utf-8'
        # what to ask the parsivel, PA is the easiest, even if it is more than required
        self.pollcmd = b'CS/PA\r'
        # usual pollcmd for user telegram would be CS/P\r
        #pollcmd = b"CS/P\r"

        # for automatic polling, the time resolution in seconds
        self.samplinginterval = 10

        # when to stop sampling, can be synced with crontab
        # and defaults to 15 minutes
        self.maxsampling = 60 * 15 
        # where to store the data
        self.outpath = outpath
        # the prefix for the file to be used
        self.fileprefix = 'parsivel_'
        self.stationname = stationname[:10]
        # holder for current file, will be filled by subroutines
        self.ncfile = ''
        self.csvfile = ''
        # holder for all written files, will be filled by subroutines
        self.csvfiles = []
        self.ncfiles = []
        # dict to hold data order by variable
        self.data = {'-1': []}
        # to keep track of whether we expect data in the buffer
        self.polled = False
        # for waiting a tenth of a second for new bytes in the buffer
        self.waitdt = 0.1
        # to keep track of the waiting time
        self.waittime = 0
        # the upper limit of waiting
        self.maxwait = 3
        # increment buffersize to hold more than one record, maybe useless
        self.ReadBufferSize = 2**16;
        # default output order, ASDO compatible
        self.csvoutputorder = ['21','20', '01', '02', '03', '05', '06', '07',
                            '08', '10', '11', '12', '16', '17', '18', '34', '35', '93']
        # default output header, ASDO compatible
        self.csvheader = ['Date', 'Time', 'Intensity of precipitation (mm/h)', 'Precipitation since start (mm)', 'Weather code SYNOP WaWa',]
        self.csvheader += ['Weather code METAR/SPECI', 'Weather code NWS', 'Radar reflectivity (dBz)', 'MOR Visibility (m)', ]
        self.csvheader += ['Signal amplitude of Laserband', 'Number of detected particles', 'Temperature in sensor (°C)', ]
        self.csvheader += ['Heating current (A)', 'Sensor voltage (V)', 'Optics status', 'Kinetic Energy', 'Snow intensity (mm/h)', 'Spectrum']
        # add meta info forr ncfile
        self.ncmeta = {
                       'Source': 'OTT Parsivel-2 optical disdrometer',
                       'History': 'Data acquired with MOXA USB converter',
                       'Dependencies': 'external',
                       'Conventions': 'CF-1.6 where applicable',
                       'Comment': "Manual of the OTT Parsivel-2 can be found online" \
                                  " at https://www.ott.com",
                       "Licence": "For non-commercial use only. Any usage of the data"\
                                  " should be reported to the contact person(s).",
                     }

        self.ncmapping = {'09': 'interval',
                          '25': 'error_code',
                          '16': 'I_heating',
                          '17': 'V_sensor',
                          '18': 'state_sensor',
                          '10': 'sig_laser',
                          '01': 'rainfall_rate',
                          #'02': 'RR_total',
                          '03': 'synop_WaWa',
                          '04': 'synop_WW',
                          '07': 'radar_reflectivity',
                          '08': 'visibility',
                          '12': 'T_sensor',
                          '11': 'n_particles',
                          #'24': 'RR_accum',
                          '34': 'E_kin',
                          '90': 'number_concentration',
                          '91': 'fall_velocity',
                          '93': 'data_raw',
                         }

        self.nctransformation = {'01': lambda x: x * 60 * 60 / 1000,
                                 '12': lambda x: x + 273.15,
                         }

        # add any other information from ncmeta
        for key, value in ncmeta.items():
           #if key.lower() in ['name', 'location']:
           #    key = f'Station_{key}'
           self.ncmeta[key] = value

        if not self.isOpen:
            self.open()

        self.flush()

    def __del__(self):
        self.close()
        time.sleep(1)

    def settime(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        now = datetime.datetime.now(datetime.UTC)
        cmd = b'CS/T/'+bytes(now.strftime('%H:%M:%S\r').encode(self.codec))
        if not self.quiet:
            print('Sending settime command ', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to settime from parsivel was ', answer)

        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def gettime(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        cmd = b'CS/T\r'
        if not self.quiet:
            print('Sending gettime command ', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to gettime from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def setdate(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        now = datetime.datetime.now(datetime.UTC)
        cmd = b'CS/D/'+bytes(now.strftime('%d.%m.%Y\r').encode(self.codec))
        if not self.quiet:
            print('Sending setdate command to parsivel ', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to setdate from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def getdate(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        cmd = b'CS/D\r'
        if not self.quiet:
            print('Requesting date via ', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to getdate from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def setrtc(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        now = datetime.datetime.now(datetime.UTC)
        cmd = b'CS/U/'+bytes(now.strftime('%d.%m.%Y %H:%M:%S\r').encode(self.codec))
        if not self.quiet:
            print('Sending setrtc command', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to setrtc from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def getrtc(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        cmd = b'CS/U\r'
        if not self.quiet:
            print('Sending getrtc command to parsivel ', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to getrtc from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def setstationname(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        # max of 10 letter allowed
        sname = self.stationname[:10]
        cmd = b'CS/K/'+bytes(sname.encode(self.codec))
        if not self.quiet:
            print('Sending setstationname command to parsivel', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to setstationname ({sname}) from parsivel was', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def getstationname(self):

        if not self.isOpen:
            self.open()
            time.sleep(1)

        time.sleep(0.2)
        # max of 10 letter allowed
        sname = self.stationname[:10]
        cmd = b'CS/K\r'
        if not self.quiet:
            print('Sending getstationname command to parsivel', cmd)
        update = self.write(cmd)
        self.flush()
        time.sleep(2)
        answer = b''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            print('Answer to getstationname from parsivel was ', answer)
        self.flush()
        return answer.strip(b'\r\nOK\r\n\n').decode(self.codec).strip()

    def setdatetime(self):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        self.setrtc()
        self.setdate()
        self.settime()

    def setup(self):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        #sname = self.getstationname()
        self.setstationname()
        self.setdatetime()
        self.flush()

    def pollcode(self, code):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        self.flush()
        self.clearbuffer()

        thispollcmd = str(code)

        if int(thispollcmd) >= 90:
            delim = ';'
            sleeptime = 1
        else:
            delim = ''
            sleeptime = 1

        thispollcmd = (thispollcmd+delim).encode(self.codec)
        pollcmd = b'CS/R/' + bytes(thispollcmd)+b'\r\n'
        written = self.write(pollcmd)
        #self.flush()
        # according to manual there is a guarantee that the parsivel answers within 500 ms
        # so we wait here to ensure the buffer is full
        time.sleep(sleeptime)
        self.polled = True
        while self.in_waiting == 0 or self.waittime <= sleeptime:

            time.sleep(self.waitdt)
            self.waittime += self.waitdt

            if self.waittime > self.maxwait:
                if not self.quiet:
                    print(f'Breaking out of waiting for answer on serial as no data arrived after {self.maxwait} seconts!!!')
                break

        answer = ''
        if self.in_waiting > 0:
            answer = self.read_until() #size=self.in_waiting)
            answer = answer.decode(self.codec)

        self.flush()
        self.polled = False

        return answer

    def help(self):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        self.flush()
        self.clearbuffer()
        pollcmd = b'CS/?\r\n'
        written = self.write(pollcmd)
        self.flush()
        # according to manual there is a guarantee that the parsivel answers within 500 ms
        # so we wait here to ensure the buffer is full
        time.sleep(1)
        while self.in_waiting == 0 or self.waittime <= 0.5:

            time.sleep(self.waitdt)
            self.waittime += self.waitdt

            if self.waittime > self.maxwait:
                if not self.quiet:
                    print(f'Breaking out of waiting for answer on serial as no data arrived after {self.maxwait} seconts!!!')
                break

        answer = ''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            answer = answer.decode(self.codec)

        print(answer)


    def getconfig(self):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        self.clearbuffer()
        pollcmd = b'CS/L\r' 
        written = self.write(pollcmd)
        # according to manual there is a guarantee that the parsivel answers within 500 ms
        # so we wait here to ensure the buffer is full
        time.sleep(0.5)
        while self.in_waiting == 0 or self.waittime <= 0.5:

            time.sleep(self.waitdt)
            self.waittime += self.waitdt

            if self.waittime > self.maxwait:
                if not self.quiet:
                    print(f'Breaking out of waiting for answer on serial as no data arrived after {self.maxwait} seconts!!!')
                break

        answer = ''
        if self.in_waiting > 0:
            answer = self.read(size=self.in_waiting)
            answer = answer.decode(self.codec)
            #answer = answer.split('\r\n')

        print(answer)
        return answer

    def poll(self):
        if not self.isOpen:
            self.open()
            time.sleep(1)

        self.flush()
        self.clearbuffer()

        written = self.write(self.pollcmd)
        # according to manual there is a guarantee that the parsivel answers within 500 ms
        # so we wait here to ensure the buffer is full
        time.sleep(0.5)
        self.polled = True

    def clearbuffer(self):
        # reset buffer in any case
        self.buffer = b''

    def cleardata(self):
        # cleanup data dict after we've written out everything usually
        self.data = {'-1': []}

    def clear(self):
        self.clearbuffer()
        self.cleardata()

    def velocity_classes(self):
        """
        Return arrays of relevant velocity classes for use with TROPOS nc.

        Hardcoded velocity bins of parsivel are used to construct:
            1. velocitybin The sizes as lower -> upper edge
            2. velocities as the the average velocity of a bin
            3. the raw_velocities that have been used to construct the above 2

        Returns
        -------
        velocitybin : array of float
            The droplet sizes .
        velocities : array of float
            The bin widths as difference to lower and upper edge for each bin.
        raw_velocities : array of float
            The bin widths (raw as per manual).

        """

        raw_velocities = [0.0] + \
            [0.1] * 10 + \
            [0.2] * 5 + \
            [0.4] * 5 + \
            [0.8] * 5 + \
            [1.6] * 5 + \
            [3.2] * 2

        velocities = np.asarray([(raw_velocities[i] + raw_velocities[i + 1])/2
                      for i in range(len(raw_velocities[:-1]))])

        velocitybin = np.cumsum(velocities)

        return velocitybin, velocities, np.asarray(raw_velocities)

    def diameter_classes(self, asmeters=True):
        """
        Return arrays of relevant diameter classes for use with TROPOS nc.

        Hardcoded bin widths of parsivel are used to construct:
            1. dropletsizes The sizes as lower -> upper edge
            2. dropletwidth as the the average of upper/lower edge
            3. the sizes that have been used to construct the above 2

        Returns
        -------
        dropletsizes : array of float
            The droplet sizes .
        dropletwidths : array of float
            The bin widths as difference to lower and upper edge for each bin.
        raw_dropletwidths : array of float
            The bin widths (raw as per manual).

        """

        raw_dropletwidths = [0.0] + \
            [0.125] * 10 + \
            [0.250] * 5 + \
            [0.500] * 5 + \
            [1] * 5 + \
            [2] * 5 + \
            [3] * 2

        dropletwidths = [(raw_dropletwidths[i]+raw_dropletwidths[i+1])/2
                         for i in range(len(raw_dropletwidths[:-1]))]

        dropletsizes = np.cumsum(dropletwidths)
        if asmeters:
            scaling = 1000
        else:
            scaling = 1
        return dropletsizes / scaling, np.asarray(dropletwidths) / scaling, np.asarray(raw_dropletwidths) / scaling

    # max sampling time in seconds (to be restarted by cronjob
    def sample(self, writeoutfreq=None):
        self.setup()

        if writeoutfreq is None:
            writeoutfreq = self.samplinginterval

        if writeoutfreq % self.samplinginterval != 0:
            print(f'Writoutfreq has been adjusted to be the lower multiple of the samplinginterval {self.samplinginterval}')
            writeoutfreq = (writeoutfreq // self.samplinginterval) * self.samplinginterval

        parsivel.reset_input_buffer()
        time.sleep(1)

        curdt = 0
        try:
            while curdt <= self.maxsampling or self.maxsampling <0:
                parsivel.getparsiveldata()
                #if curdt % 60 == 0:
                if curdt % writeoutfreq == 0:
                    parsivel.write2file()
                time.sleep(self.samplinginterval)
                curdt += self.samplinginterval
        except serial.SerialException:
            print('Issue with serial connection encounted, rerun...')
        except KeyboardInterrupt:
            print('Sampling interrupted.')

    def getparsiveldata(self):
        if not self.isOpen:
            self.open()

        now = datetime.datetime.now(datetime.UTC)

        self.flush()
        time.sleep(0.1)

        if not self.polled:
            self.poll()

        while self.in_waiting > 0 or self.waittime <= 0.6:
            self.buffer += self.read(size=self.in_waiting)

            curbytes = self.in_waiting

            time.sleep(self.waitdt)
            self.waittime += self.waitdt

            if self.waittime > self.maxwait:
                if not self.quiet:
                    print(f'Breaking out of waiting for answer on serial as no data arrived after {self.maxwait} seconds!!!')
                break

            # since we check after the time.sleep we can assume if there is nothing new that we are done
            if curbytes == self.in_waiting and curbytes < 1:
                if not self.quiet:
                    print(f'Breaking out of waiting for answer on serial as no new data has arrived after one more time step of {self.waitdr} after {self.waittime}!')
                break
        else:
            if len(self.buffer) == 0:
                if not self.quiet:
                    print(f'No bytes were available to read after {self.maxwait} seconds and we waited {self.waittime} seconds for an answer. ')
            elif len(self.buffer) > 1:
                if not self.quiet:
                    print(f'{len(self.buffer)} bytes have been read in {self.waittime} seconds. ')
            else:
                pass

        if not self.quiet:
            print('Received the following answer to poll:\n', self.buffer)

        self.waittime = 0

        self.polled = False

        # convert to sensible string
        record = self.buffer.strip(b'\x03').decode(self.codec).strip()
        # get different fields into list
        record = record.split('\r\n')
        # split into measurement value key and measurement value
        # the default return is CODE (2 Letters): data (until prev. removed \r\n
        record = {i[:2]: i[3:].rstrip(';').strip() for i in record[1:]}

        for key, value in sorted(record.items()):
            # maintenance codes
            if key in ['94', '95', '96', '97', '98', '99']:
                continue

            # build up the dict to hold the available data
            if key not in self.data:
                self.data[key] = []

            # date, time, software versions that should not be converted
            # as well as synop codes, sensor date/time and measuring start 
            # as we handle these ourselves
            if key in ['20', '21', '14', '15', '05', '06', '19', '21', '22']:
                pass

            # spectra data
            elif key in ['90', '91',  '93']:
                value = value.replace('000','')

                if value.count(';') == len(value):
                    value = np.zeros((32, 32))
                else:
                    # spectra numbers are int
                    if key in ['93']:
                        value = [int(i) if i else 0 for i in value.split(';')]
                    # others are float
                    elif key in ['90', '91']:
                        value = [float(i) if i else 0 for i in value.split(';')]

                    value = np.asarray(value)
                    if key in ['93']:
                        try:
                            value = value.reshape(32, 32)
                        except ValueError:
                            print(value.shape)
                            # fake data of the right format as apparently the serial comm was interruped
                            # this is usually the case if the script is running as several instance
                            _value = np.zeros((32,32))
                            value = _value.flatten()[:len(value)] + value
            else:
                # float
                if '.' in value and value.count('.') == 1:
                    value = float(value)
                else:
                    # maybe integer?
                    try:
                        value = int(value)
                    # neither float nor integer, maybe a weather code, like wawa
                    except ValueError:
                        print(f'Conversion to int failed for {value}, based on {key}')
                        pass

            self.data[key] += [value]

        # replace sensor time with system time
        # 21 = date, 20 = time
        if '20' in self.data and '21' in self.data:
           self.data['21'][-1] = now.strftime('%d.%m.%Y')
           self.data['20'][-1] = now.strftime('%H:%M:%S')
           # keep unix time seperate
           self.data['-1'] += [datetime.datetime.timestamp(now)]
        else:
           print(f'Issue with sampling as no date or time was passed in the serial buffer, skipping {now} and cleaning up buffer')
           print(record, self.data)
           for key, value in sorted(record.items()):
               if (self.data[key]) >= 1:
                   self.data[key].pop()

        # cleanup buffer in any case
        self.clearbuffer()

    def write2file(self, *args, **kwargs):
        self.write2asdofile(*args, **kwargs)
        self.write2ncfile(*args, **kwargs)
        self.clear()

    def _setupncfile(self):
        if os.path.exists(self.ncfile):
            nchandle = nc.Dataset(self.ncfile, 'a', format='NETCDF3_CLASSIC')
            return nchandle

        if not self.quiet:
            print(f'Setting up {outfile}')

        nchandle = nc.Dataset(self.ncfile, 'w', format='NETCDF3_CLASSIC')

        nchandle.createDimension('time', None)
        nchandle.createDimension('diameter', 32)
        nchandle.createDimension('velocity', 32)
        nchandle.createDimension('nv', 2)

        for key, value in self.ncmeta.items():
            setattr(nchandle, key, value)

        now = datetime.datetime.now(datetime.UTC)
        setattr(nchandle, "Processing_date", str(datetime.datetime.now(datetime.UTC)) + ' (UTC)')

        datavar = nchandle.createVariable('lat', 'd', ())
        setattr(datavar, 'standard_name', 'latitude')
        setattr(datavar, 'long_name', 'Latitude of instrument location')
        setattr(datavar, 'units', 'degrees_north')
        datavar.assignValue(self.ncmeta['latitude'])

        datavar = nchandle.createVariable('lon', 'd', ())
        setattr(datavar, 'standard_name', 'longitude')
        setattr(datavar, 'long_name', 'Longitude of instrument location')
        setattr(datavar, 'units', 'degrees_east')
        datavar.assignValue(self.ncmeta['longitude'])

        datavar = nchandle.createVariable('zsl', 'd', ())
        setattr(datavar, 'standard_name', 'altitude')
        setattr(datavar, 'long_name',
                'Altitude of instrument sensor above mean sea level')
        setattr(datavar, 'units', 'm')
        datavar.assignValue(self.ncmeta['altitude'])

        datavar = nchandle.createVariable('time', 'i', ('time',))
        setattr(datavar, 'standard_name', 'time')
        setattr(datavar, 'long_name',
                'Unix time at start of data transfer in seconds after 00:00 UTC on 1/1/1970')
        setattr(datavar, 'units', 'seconds since 1970-01-01 00:00:00')
        setattr(datavar, 'bounds', 'time_bnds')
        setattr(datavar, 'comment',
                'Time on data acquisition pc at initialization of serial connection to Parsivel.')

        datavar = nchandle.createVariable('time_bnds', 'i', ('time', 'nv'))
        setattr(datavar, 'standard_name', 'Measurement interval bounds') # time_bnds
        setattr(datavar, 'long_name', 'Timespan of the measurement interval')
        setattr(datavar, 'units', 's')
        setattr(datavar, 'comment', 'Upper and lower bounds of measurement interval.')

        datavar = nchandle.createVariable('interval', 'i', ('time',))
        setattr(datavar, 'standard_name', 'Time interval')  # interval
        setattr(datavar, 'long_name', 'Length of measurement interval')
        setattr(datavar, 'units', 's')
        setattr(datavar, 'comment',
                'Variable 09 - Sample interval between two data retrieval requests.')


        diameters  = self.diameter_classes()
        datavar = nchandle.createVariable('diameter', 'd', ('diameter',))
        setattr(datavar, 'standard_name', 'Particle diameter') # diameter
        setattr(datavar, 'long_name', 'Center diameter of precipitation particles')
        setattr(datavar, 'units', 'm')
        setattr(datavar, 'comment',
                'Predefined diameter classes. Note the variable bin size.')
        datavar[:] = diameters[0]

        datavar = nchandle.createVariable('diameter_spread', 'd', ('diameter',))
        setattr(datavar, 'standard_name', 'Particle spread') # diameter spread
        setattr(datavar, 'long_name', 'Width of diameter interval')
        setattr(datavar, 'units', 'm')
        setattr(datavar, 'comment', 'Bin size of each diameter class.')
        datavar[:] = (diameters[1])

        datavar = nchandle.createVariable('diameter_bnds', 'i', ('diameter', 'nv'))
        setattr(datavar, 'standard_name', 'Particle bounds') # diameter bnds
        setattr(datavar, 'long_name', 'Bounds of the diameter interval')
        setattr(datavar, 'units', 'm')
        setattr(datavar, 'comment', 'Upper and lower bounds of diameter interval.')
        datavar[:, :] = np.stack([np.cumsum(diameters[2][:-1]), np.cumsum(diameters[2][1:])]).T

        velocities = self.velocity_classes()

        datavar = nchandle.createVariable('velocity', 'd', ('velocity',))
        setattr(datavar, 'standard_name', 'Fall velocity') # veloc
        setattr(datavar, 'long_name',
                'Center fall velocity of precipitation particles')
        setattr(datavar, 'units', 'm s-1')
        setattr(datavar, 'comment',
                'Predefined velocity classes. Note the variable bin size.')
        datavar[:] = (velocities[0])

        datavar = nchandle.createVariable('velocity_spread', 'd', ('velocity',))
        setattr(datavar, 'standard_name', 'Fall velocity spread') # vel spred
        setattr(datavar, 'long_name', 'Width of velocity interval')
        setattr(datavar, 'units', 'm')
        setattr(datavar, 'comment', 'Bin size of each velocity interval.')
        datavar[:] = (velocities[1])

        datavar = nchandle.createVariable('velocity_bnds', 'd', ('velocity', 'nv'))
        setattr(datavar, 'standard_name', 'Fall velocity bounds') # vel bnds
        setattr(datavar, 'comment', 'Upper and lower bounds of velocity interval.')
        datavar[:, :] = np.stack([np.cumsum(velocities[2][:-1]), np.cumsum(velocities[2][1:])]).T


        datavar = nchandle.createVariable(
            'data_raw', 'd', ('time', 'diameter', 'velocity',), fill_value=-999.)
        setattr(datavar, 'standard_name', 'Particle count per velocity and diameter bin') # data_raw
        setattr(datavar, 'long_name',
                'Raw Data as a function of particle diameter and velocity')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment', 'Variable 93 - Raw data.')

        datavar = nchandle.createVariable(
            'number_concentration', 'd', ('time', 'diameter',), fill_value=-999.)
        setattr(datavar, 'standard_name', 'Total particle count in time interval') # n_particles
        setattr(datavar, 'long_name', 'Number of particles per diameter class')
        setattr(datavar, 'units', 'log10(m-3 mm-1)')
        setattr(datavar, 'comment', 'Variable 90 - Field N (d)')

        datavar = nchandle.createVariable(
            'fall_velocity', 'd', ('time', 'diameter',), fill_value=-999.)
        setattr(datavar, 'standard_name', 'Fall velocity') # radar refl
        setattr(datavar, 'long_name', 'Average velocity of each diameter class')
        setattr(datavar, 'units', 'm s-1')
        setattr(datavar, 'comment', 'Variable 91 - Field v (d)')

        datavar = nchandle.createVariable('n_particles', 'i', ('time',))
        setattr(datavar, 'standard_name', 'Total particle count in time interval') # n particles
        setattr(datavar, 'long_name', 'Number of particles in time interval')
        setattr(datavar, 'units', '#')
        setattr(datavar, 'comment', 'Variable 11 - Number of detected particles')

        datavar = nchandle.createVariable(
            'rainfall_rate', 'd', ('time',), fill_value=-999.)
        setattr(datavar, 'standard_name', 'rainfall_rate')
        setattr(datavar, 'long_name', 'Precipitation rate')
        setattr(datavar, 'units', 'm s-1')
        setattr(datavar, 'comment', 'Variable 01 - Rain intensity (32 bit) 0000.000')

        datavar = nchandle.createVariable(
            'radar_reflectivity', 'd', ('time',), fill_value=-999)
        setattr(datavar, 'standard_name', 'equivalent_reflectivity_factor')
        setattr(datavar, 'long_name', 'equivalent radar reflectivity factor')
        setattr(datavar, 'units', 'dBZ')
        setattr(datavar, 'comment', 'Variable 07 - Radar reflectivity (32 bit).')

        datavar = nchandle.createVariable('E_kin', 'd', ('time',), fill_value=-999.)
        setattr(datavar, 'standard_name', 'Kinetic energ') # ekin
        setattr(datavar, 'long_name', 'Kinetic energy of the hydrometeors')
        setattr(datavar, 'units', 'kJ')
        setattr(datavar, 'comment', 'Variable 24 - kinetic Energy of hydrometeors.')

        datavar = nchandle.createVariable(
            'visibility', 'i', ('time',), fill_value=-999)
        setattr(datavar, 'standard_name', 'Visibility') # visi
        setattr(datavar, 'long_name', 'Visibility range in precipitation after MOR')
        setattr(datavar, 'units', 'm')
        setattr(datavar, 'comment',
                'Variable 08 - MOR visibility in the precipitation.')

        datavar = nchandle.createVariable(
            'synop_WaWa', 'i', ('time',), fill_value=-999)
        setattr(datavar, 'standard_name', 'Synop Code WaWa') # synop wawa
        setattr(datavar, 'long_name', 'Synop Code WaWa')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment',
                'Variable 03 - Weather code according to SYNOP wawa Table 4680.')

        datavar = nchandle.createVariable(
            'synop_WW', 'i', ('time',), fill_value=-999)
        setattr(datavar, 'standard_name', 'Synop Code WW') # synop ww
        setattr(datavar, 'long_name', 'Synop Code WW')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment',
                'Variable 04 - Weather code according to SYNOP ww Table 4677.')

        datavar = nchandle.createVariable(
            'T_sensor', 'i', ('time',), fill_value=-999)
        setattr(datavar, 'standard_name', 'Temperature') # T_sensor
        setattr(datavar, 'long_name', 'Temperature in the sensor')
        setattr(datavar, 'units', 'K')
        setattr(datavar, 'comment', 'Variable 12 - Temperature in the Sensor')

        datavar = nchandle.createVariable('sig_laser', 'i', ('time',))
        setattr(datavar, 'standard_name', 'Laser signal amplitude') # sig_laser
        setattr(datavar, 'long_name', 'Signal amplitude of the laser')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment',
                'Variable 10 - Signal ambplitude of the laser strip')

        datavar = nchandle.createVariable('state_sensor', 'i', ('time',))
        setattr(datavar, 'standard_name', 'State of the Sensor') # v_sensor
        setattr(datavar, 'long_name', 'State of the Sensor')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment', 'Variable 18 - Sensor status:\n'\
                                    '0: Everything is okay.\n' \
                                    '1: Dirty but measurement possible.\n'\
                                    '2: No measurement possile')

        datavar = nchandle.createVariable('V_sensor', 'd', ('time',))
        setattr(datavar, 'standard_name', 'Sensor Voltage') # v_sensor
        setattr(datavar, 'long_name', 'Sensor Voltage')
        setattr(datavar, 'units', 'V')
        setattr(datavar, 'comment', 'Variable 17 - Power supply voltage in the sensor.')

        datavar = nchandle.createVariable('I_heating', 'd', ('time',))
        setattr(datavar, 'standard_name', 'Heating Current') # I_heating
        setattr(datavar, 'long_name', 'Heating Current')
        setattr(datavar, 'units', 'A')
        setattr(datavar, 'comment', 'Variable 16 - Current through the heating system.')

        datavar = nchandle.createVariable('error_code', 'i', ('time',))
        setattr(datavar, 'standard_name', 'Error code') # Errors codes
        setattr(datavar, 'long_name', 'Error Code')
        setattr(datavar, 'units', '1')
        setattr(datavar, 'comment', 'Variable 25 - Error code.')

        return nchandle




    def write2ncfile(self, intosubdirs=True, ):

        if self.data and '21' in self.data:
            pass
        else:
            if not self.quiet:
                print('No data have been read yet. Call getparsiveldata() first.')
            return

        if not self.outpath.endswith(os.sep):
            self.outpath += os.sep

        os.makedirs(self.outpath, exist_ok=True)
        if '21' in self.data:
            pass
        else:
            print('No records to write yet in self.data')
            return
        if '21' in self.data and len(self.data['21']) > 0:
            pass
        else:
            print('No records to write yet in self.data')
            return

        udays = sorted(list(set(self.data['21'])))
        for day in udays:
            if intosubdirs:
               ymd = day.split('.')[::-1]
               ymd = [i + j for i, j in zip(['Y', 'M', 'D'], ymd)]
               _outpath = self.outpath+os.sep.join(ymd)+os.sep
               os.makedirs(_outpath, exist_ok=True)
            else:
                _outpath = self.outpath

            # day has the format d.m.Y but we want the filename to be Ymd
            outfile = self.fileprefix +''.join(day.split('.')[::-1]) + '.nc'
            self.ncfile = _outpath + outfile
            nchandle = self._setupncfile()
            setattr(nchandle, 'Date', day)

            index_of_day = [i[0] for i in enumerate(self.data['21']) if i[1] == day]

            curtimestep = nchandle.dimensions['time'].size

            unixtime = ([self.data['-1'][i] for i in index_of_day])
            nchandle.variables["time"][curtimestep] = (unixtime)
            bnds = [[self.data['-1'][i] - int(self.data['09'][i]), self.data['-1'][i]] for i in index_of_day]
            nchandle.variables['time_bnds'][curtimestep, :] = (bnds)

            varNames = nchandle.variables.keys()

            for ncvar in self.ncmapping:
                thisvar = nchandle.variables[self.ncmapping[ncvar]]
                thisdata = [self.data[ncvar][i] for i in index_of_day]

                if ncvar in self.nctransformation:
                    thisdata = [self.nctransformation[ncvar](i) for i in thisdata]

                if len(thisvar.shape) == 1:
                    thisvar[curtimestep] = (thisdata)
                elif len(thisvar.shape) == 2:
                    thisvar[curtimestep, :] = (thisdata)
                elif len(thisvar.shape) == 3:
                    thisdata = np.asarray(thisdata).reshape(thisvar.shape[1:])
                    thisvar[curtimestep, :, :] = (thisdata)

            nchandle.close()
            now = datetime.datetime.now(datetime.UTC)

            print(f'Written {len(index_of_day)} records of data to {self.ncfile} at {now}')
            self.ncfiles = list(set(self.ncfiles+[self.ncfile]))
        pass

    # order can be anything, but defaults to ASDO format, see header in below function
    def write2asdofile(self, intosubdirs=True, varorder=[], header=[]):
        assert len(varorder) == len(header), 'Order of variables and header have to match'

        if self.data:
            pass
        else:
            print('No data have been read yet. Call getparsiveldata() first.')
            return

        if varorder:
            self.csvoutputorder = varorder

        if header:
            self.csvheader = header

        if not self.outpath.endswith(os.sep):
            self.outpath += os.sep

        os.makedirs(self.outpath, exist_ok=True)
        filemode = 'a'

        # examples ASDO file
        #04.03.2023,00:00:00,0.000,216.57,0,NP,C,-9.999,20000,19866,0,-1,0.64,23.8,0,0.000,0,<SPECTRUM>ZERO</SPECTRUM>
        #2023.03.28,14:30:58,0.0,33.06,0,NP,C,-9.999,20000,21649,0,16,0.0,23.8,0,0.0,0.0,<SPECTRUM></SPECTRUM>
        # 25.02.2023,00:05:30,3.100,210.07,62,RA,R,32.661,8290,16569,90,1,0.00,23.8,0,53.310,0,
        # <SPECTRUM>,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
        # ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
        # ,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
        #,,,,,,,,,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
        #,1,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5,2,2,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
        #4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,1,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,1,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,1,,3,
        #1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,4,10,7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3,8,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4,
        #2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,,,,,,
        #,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
        #,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,</SPECTRUM>
        if '21' in self.data:
            pass
        else:
            print('No records to write yet in self.data')
            return
        if '21' in self.data and len(self.data['21']) > 0:
            pass
        else:
            print('No records to write yet in self.data')
            return
        udays = sorted(list(set(self.data['21'])))

        for day in udays:
            if intosubdirs:
               ymd = day.split('.')[::-1]
               ymd = [i + j for i, j in zip(['Y', 'M', 'D'], ymd)]
               _outpath = self.outpath+os.sep.join(ymd)+os.sep
               os.makedirs(_outpath, exist_ok=True)
            else:
                _outpath = self.outpath

            # day has the format d.m.Y but we want the filename to be Ymd
            self.csvfile = self.fileprefix +''.join(day.split('.')[::-1]) + '.csv'
            writeheader = True

            # write out the buffer to file
            if os.path.exists(_outpath + self.csvfile):
                writeheader = False

            # maxtimesteps because self.data holds everything
            ntimesteps = len(self.data['20'])

            with open(_outpath+self.csvfile, filemode) as fo:

                if writeheader:
                     fo.write(','.join(self.csvheader))
                     fo.write('\n')

                for timestep in range(ntimesteps):
                    # skip if not the same day
                    if self.data['21'][timestep] != day:
                        continue
                    for key in self.csvoutputorder:
                        varrec = self.data[key][timestep]
                        if key in '93':
                            fo.write('<SPECTRUM>')

                        if key in ['90', '91', '93']:
                            if not isinstance(varrec, str):
                                varrec = ','.join([str(i) if i > 0 else '' for i in varrec.flatten()])

                            if len(varrec) == varrec.count(','):
                                varrec = 'ZERO'
                            else:
                                varrec += ','

                        fo.write(str(varrec))

                        if key in '93':
                            fo.write('</SPECTRUM>')
                        else:
                            fo.write( ',')

                    fo.write('\n')
            self.csvfiles = list(set(self.csvfiles+[self.csvfile]))
            if not self.quiet:
                print(f'Written {ntimesteps} records to {_outpath+self.outfile} for {day}')

if __name__ == '__main__':
    parsivel = parsivel_via_serial(outpath='./',)
    #parsivel.help()
    #parsivel.pollcode(33)
    #parsivel.pollcode(93)
    try:
        parsivel.sample()
    except KeyboardInterrupt:
        print('Sampling interrupted.')

    #time.sleep(1)
    #cfg = parsivel.getconfig()
    #print(cfg)
    #time.sleep(1)
    #curdt = 0
    #try:
    #    while curdt <= maxdt :
    #        parsivel.getparsiveldata()
    #        #if curdt % 60 == 0:
    #        parsivel.write2file()
    #        time.sleep(dt)
    #        curdt += dt
    #except serial.SerialException:
    #    print('Issue with serial connection encounted, rerun...')
    #    del parsivel
    #finally:
    #    del parsivel

In my last project, I had to sample data coming via serial (RS482-2W) from an OTT Parsivel². Initially, I connected it to a Windows laptop via a Moxa UPort-1150 serial to USB adapter and let ASDO do the sampling. However, I often ran into issues, namely that ASDO would simply crash. Restarting it would require me to remotely connect to the laptop, kill ASDO via task manager and restart it. Depending on how soon I realised the issue (usually the latest the next morning when I would receive an email) some hours to some days (when it happened on the weekend) of data would be gone as the device does not store data.

Eventually, I got fed up of this manual approach and decides to program my own solution and move the sampling to a Linux device (in this case the server as that is what I had available). While the USB adapter required drivers needed to be compiled first, once that was working I could now sample the Parsivel with the code linked on top. The sampling can be setup via cronjob – I choose every 15 minutes so that there is no risk of time divergence (setup set date and time of the parsivel and is called before sampling) and in case there are some serial buffer issues.

April 12, 2024April 12, 2024

Minor SVG hack

TL;DR; Summary and code

This little script replaces certain properties in an SVG. By default, it only looks first at the file you pass in. Be aware that the wrong replacement can break your SVG. For this reason, the script makes a new SVG by default. As always, the GitHub gist should be the up-to-date version.

Python

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# ********** Careful, you may break SVG graphics with this************
# ********************** READ THE BELOW FIRST ************************
# ********************** No warranty is taken ************************
# By default nothing other than finding RGB codes in the SVG is done
# and only if you want to change something you can change what you want
# to change (inprop) to what it should be (outprop). For this to work
# set inspect to False.

def svg_property_replacer(file, 
                          path='',
                          inspect='rgb',
                          inprop='rgb(0%,0%,0%)',
                          outprop='rgb(100%,0%,0%)',
                          outfile='',
                          force=False):
    import os

    if not outfile:
        outfile = file[:-4]+'_replaced.svg'

    # does the file even exist?
    if not os.path.exists(path+file):
        # answer no, it doesn't, report it, return False
        print('File does not exist')
        return False
    elif not (path+file).endswith('.svg'):
        # answer yes, it does, but isn't an svg, report False
        print('File is not an SVG')
        return False
    else:
        # yes the file exists
        # read the file in
        with open(path+file) as fo:
            img = [line for line in fo.readlines()]

        if 'svg' not in img[0] and 'svg' not in img[1]:
            # it seemed like a svg, but is not a proper one, this may be
            # a mistake by the program that produced it in the first place
            print('Warning! No SVG tag has been found at the start of', file)
            print('For safety reasons only inspection is done.',
                  'Set force to override.')
            print('First lines were', img[0], img[1])
            if not force:
                inspect = 'svg'

        if inspect:
            # look for the inspect string in the lines and print them to stdout
            found = 0
            for _ in img:
                if inspect in _:
                    print(inspect+_.split(inspect)[1].split(';')[0])
                    found = 1
            return found

        else:
            # look for the inprop string in the lines and replace with outprop
            for _ in range(len(img)):
                img[_] = img[_].replace(inprop, outprop)

            # write the outfile as _replaced
            with open(path+outfile, 'w') as fo:
                fo.writelines(img)
        return outfile

if __name__ == '__main__':
    ###### EXAMPLE ###### 
    
    # This example turns the SVG into "darkmode", but be careful
    # the results may vary, depending on the original file, i.e. if there is 
    # some text in black it'll be white and that may not always make sense
    # the same for big white areas. Just try it out
    
    path = '~'
    file = 'test.svg'
    file = path+file
    props = [['rgb(0%,0%,0%)','REPLACEMEAGAIN'],
             ['rgb(100%,100%,100%)', 'rgb(0%,0%,0%)'],
             ['REPLACEMEAGAIN', 'rgb(100%,100%,100%)'],
             ['<svg ', '<svg style="background-color: rgb(15%,15%,15%);" '],
             ]
    
    # file will be named the same with appended _dark
    outfile = file.split('.svg')
    outfile = outfile[0] + '_dark.svg'
    
    for inprop, outprop in props:
        # make another file first, so we can try out different things
        if [inprop,outprop] == props[0]:
            infile = file
        else:
            # then overwrite the same file always to just have one
            infile = outfile
    
        svg_property_replacer(infile,
                              outfile=outfile,
                              inprop=inprop,
                              outprop=outprop,
                              inspect=False,
                              )

Background & motivation

Say you have an SVG, either made by yourself, from a colleague or by extracting from a PDF. This is a great starting point and editing vector graphics is often far more comfortable than raster graphics as you won’t have to edit single pixels. Now you could edit this file in Inkscape, illustrator, or any of the many other software solutions. But if all you want to do is change a certain color, or remove some part of the graphics, you can do that without installing any other software. Instead, you can open the SVG file with your text editor of choice and search for the colour you want to replace (note that colours in SVG files are generally in the hexadecimal form, i.e. #RRGGBB, a search for “colour picker” will generally help you figure out what to look for/pick as replacement). It might be a good idea to back up your file beforehand as well. So let’s say you have the following rectangle in your SVG:

<rect
       style="opacity:0.89;fill:#ffffff;stroke-width:0.264583;stroke:#000000;stroke-opacity:1"
       id="rect165"
       width="50"
       height="35"
       x="90"
       y="80" />

This will be a bit transparent, white filled with a black solid border (I added another rectangle just for show):

You can first search for “fill:” or for “stroke:” to see the occurrences of colour uses in your file and in other places. As you can see, it can also be useful and simple to change the opacity or similar by simply editing the file and changing that one value, for example, the red to light blue (replacing #ff0000 with #3399ff):

But most likely you want to do more/change more than one occurrence and/or more than one property. This is where the above script comes in, as you can run it like the example at the bottom of the script and change many properties in a row (note that the REPLACEME is so that we don’t just change everything from white to black and then both previous ones back to black.

Enjoy using it and easily changing your SVGs!