Road Map

From Mapping DH

Done

Data entry

  1. Add sample data for centers, initiatives, departments, working groups, journals, projects and institutions
  2. Add all associations that are member of ADHO and EADH, as well as additional associations
  3. Add all journals that are directly relevant to DH
  4. Add all centers from the centerNet list (with their city, country, institution)
  5. Add year of inception to all journals
  6. Add given name and last name to all academics.
  7. Add "in city: city" to all institutions; this is mostly missing and one way to geolocate centers: via their institution's city's location.
  8. Add all institutions included in the DHCR data with Wikidata ID and coordinate location, for referencing when adding courses.
  9. Import countries, institutions with cities, and disciplines (as field of work or separately?) from DHCR ahead of an import of DHCR data
  10. Add DHCR IDs (URL) to the programmes
  11. Import selected DHCR data: programme name, type (Bachelor, Master, Ph.D.), institution, URL to offer, disciplines, URL to DHCR
  12. Add Academies of Science (and solve their status as HEI and/or/with DH centers or departments).
  13. Make sure each city has a coordinate location and a Wikidata ID.
  14. Add a few more journals
  15. Add their affilation (is faculty / staff employee) to the academics; that also geolocates them via the institution's city

Data modeling

  1. Model centers with location (in city: city); then city (in country: country) and country (in world region: region)
  2. Use EntitySchema for one class and test items against it.
  3. Check and implement the EDTF feature for from-to in a single, composed value, e.g.: 1994/2010-10 (sometime in 1994 to sometime in October 2010).
  4. Switch over Journals from inception/discontinuation to duration, and adjust the standard query. Done for Journals.
  5. Structure the "fields of work" class using types (also to test that mechanism instead of subclasses): HSS, DH, CS (done); give each field of work a type (done); add this to the standard query (done) and add the types to the data model (done)
  6. For the major / top-level / most used classes: Rework the description of the data model to happen on the classes and properties: Scope note, data model description, prototypical items, link to Entity Schema.

Next steps

Data entry

  1. (in progress) Check the centernet data: add inception date, discontinuation date, specify more fields, add institutions where applicable (part of)
  2. (in progress) Add more information to journals: affiliated associations, editors, year of inception (done), publication model (done), language(s)
  3. (in progress) Add more people with their institutions (as needed)
  4. (next) Add the major sources of information as their own class, and with sources as items, and then use this for source statements in the dataset.
  5. (next) Expand the list of fields of work and base it on some existing list; this one is pretty good: https://docs.vscentrum.be/access/scientific_domains.html
  6. Use more multilingual labels for items and properties, notably: Spanish, but generally: local language (using QS or Special Page).
  7. Add IDHC IDs (URL) to the conferences
  8. Use duration also in all kinds of other places where inception / discontinuation may still be used.
  9. Add more information to centers: leadership roles, year of inception, part of institution.
  10. Add more information to associations: leadership roles, year of inception, city/country, language(s).
  11. Add more information on the conference series of associations, with their associated locations, years, HEIs, etc.
  12. Add some of the longer-running conference series: DHd (done), DH (in progress), ACH, TEI conferences, Tübingen Kolloquium, many more!
  13. Make sure every academic has an ORCiD entry (or a note that it is missing).
  14. Add Wikidata IDs to countries using OpenRefine.
  15. Find a way of adding references to many of the statements that come from the bulk sources.
  16. Add Wikidata IDs (and ISSN) for journals
  17. Add this instance to the following registry: https://wikibase.world/ (when some basics are in place)
  18. Add the conference series from IDHC, with association, website, duration. (Later, add the individual conferences.)
  19. For the class "academics", include not just their "field(s( of work", but also their time-indexed job designations. Professorships with specific disciplinary scopes are good indicators of where a discipline stands in terms of institutionalization.
  20. As another indicator of institutionalization, add meta-publications (surveys, reviews, definitions, history of DH)
  21. As another indicator of institutionalization, add associations' membership numbers with timestamp (as far as available)
  22. As another indicator of institutionalization, add training materials (textbooks, handbooks, OERs) with their authors
  23. As another indicator of institutionalization, add dedicated funding programmes (if they exist)
  24. As another indicator of institutionalization, add the date of inception to the study programmes (as far as can be found out)
  25. As another indicator of institutionalization, add book series dedicated to DH.
  26. Add the specific departments people, centers, and study programmes are attached to within universities, because this is very indicative of the position of DH as well: CS vs. Humanities, for example.

Data modeling

  1. (in progress) For the minor classes: Rework the description of the data model to happen on the classes and properties: Scope note, data model description, prototypical items, link to Entity Schema (if already present)
  2. (in progress) Remodel the "located in" Property as "in city" or "in country", depending on context. See: https://mapping-dh.wikibase.cloud/w/index.php?title=Special:WhatLinksHere/Property:P22&limit=100
  3. (next) Expand EntitySchema to further classes;
  4. Rework some classes/subclasses into class with type, when this makes sense. Maybe not as important as I thought. Also, should the "type" be a qualifier on the instance of property or a separate property?
  5. It seems to me that the three types of networks are really just one subclass and could be type-d with an extra attribute, rather than each being their own subclass; also, see above, there should be just one class "(dependent) unit" with types for center, initiative, etc. And "independent institution" for universities, academies of science, etc. Because their data model differs, they should be two classes, but all types are very similar in how they are described. Some initiatives may be "independent institutions" without affiliation (and without location!), so that might be an issue.
  6. Rework the Event class as follows: Distinguish class "(single) event" and class "event series"; then distinguish, using "type" (to be used for both single event or event series): conference, workshop, training school. – Single events can be part of an "event series" or not. Alternatively, possibly, set up a dummy instance of "event series" called events outside a series and enforce that any (single) event must be part of a series.
  7. Validate the class "Unit" using the EntitySchema and fix the errors, mostly due to missing websites (P4) or missing institutions that the units are part of. Remodel as "independent institution" where necessary, may create an subclass "independent initiative" for this as well.
  8. Isn't "Association" actually a kind of independent institution? It doesn't have a top-level institution it belongs to, except for the case of umbrella organization vs. member organization, but each member organization is also an independent entity. Maybe move the associations, let's see.
  9. Similarly to the case of associations, "resource" should really be a subclass of publication, and any publication (edited volumes, in particular) can be a source for Mapping DH, as can URLs.

Open issues

  • At the moment, units (dependent) are distinguished from institutions (independent). Units are part of an institution, and get their city and location from the institution. Instiutions are not part of another institution, and get their location from their own city property. However, there are cases where dependent units (like an institute of an academy) have their own location in a city other than the headquarters of the institution; how do we deal with this in a way that doesn't break queries?
  • There are not only centers or associations with projects or working groups, but centers and departments can also have research groups or research units that are smaller than departments, e.g.: https://www.crihn.org/projets/equipes/ What to do with them?
  • The biggest question is whether or not to maintain a separate Wikibase instance for this whole endeavor, or to try and do this work in Wikidata directly. The upsides of working in Wikidata would be to make queries easier, reduce redundancy, reduce data entry efforts, increase visibility and findability of this dataset, others will help inadvertently; the downsides would be that this is easier to understand, there is more freedom wrt to the data model, we can include things without worrying about whether or not someone likes this, others don't change our data.
  • Where should we record location information? For conferences, centers and departments (and possibly also for people, projects), both the item itself or the hosting institution, or the city where the hosting institution is located, are candidates. We need to balance avoiding redundancy (location information in multiple places), avoiding lack of precision (location information limited to the city granularity when it might be of interest to see where in a larger city one or another university or center is located) and maintaining simple queries (getting location information from various items higher up in the hierarchy is more tricky than directly from an item). => Use "in city" only on independent institutions, and record coordinate locations only on cities, but use city in a rather fine-grained manner.
  • Merge or distinguish "has affiliation" (center has affiliation with institution) from "is member of" (center is member of centerNet), and also need something to express event is organized by center, institution, person?
  • Merge or distinguish "is host" (center is host of event) and "runs" (person runs project or initiative) and "leads" (person leads a center)
  • Model things bidirectionally, or infer the inverse relationship if necessary? E.g., a center can have leadership, and a person can lead a center. Encode both or not?
  • Use "exact match" or "Wikidata ID" to link items in this Wikibase to Wikidata?
  • How to handle the "editor" or "director" or "executive officer" properties: Only for "journal/center has editor X", or also the inverse, "person is editor of journal/center"? Or does this require two different properties? Check Wikidata.
  • If now using "affiliation" also for the relationship between a center and an HEI, the property "host institution" may no longer be needed.
  • (resolved) Merge or distinguish centers, initiatives and departments? They have a very similar data model. The properties available to them are the same anyways (this is not currently constrained), so they can all be treated the same; but they are instances of different classes (center, initiative, department). The alternative would be to say "instance of center" (or "unit") in a very very broad sense, and then distinguish the type by saying: has type center|initiative|department? The former seems easier and is what was adopted.
  • Is "Training School" an event (and often part of an event series) or is it a Study Programme? Currently, it could be both.
  • How could a selection of projects be included here? There are too many to include them all, but some clearly have a role in the institutional history of DH, such as DARIAH-EU for example. But we need criteria for inclusion and exclusion, e.g. international scope (activity in at least n countries) or persistence (duration of at least n years). Tricky.
  • Encode "in city" for units directly on the unit, or have them all be "part of" an institution and derive the city from there? Tricky for cases where initiatives or networks don't have an institutional affiliation, or no location at all (those are placed on an island). Tricky also for queries, when both needs to be checked in order to map all possible units.
  • Similarly, what about "in city" for people? Locate them only via their (various) roles in (various) parts of an institution with "in city", or always give them one or several cities to make that easier?
  • Develop a plan for sustainability: Export of data to standard formats for archiving on Zenodo; enlist and train collaborators to keep specific pieces of information up to date (e.g. association leadership data, journals' editorial teams); think about data integration with Wikidata.

Other concerns

  • Link from start page text to scope and goals
  • Visualize data model as a network of classes and properties, maybe using WebProtégé?
  • Think about additional publication types, e.g.: the "DHd-Blog" is relevant for community building. Is it a network or a publication?
  • In the queries about items in a city or country: Show the in-query search for entity numbers, or just write a comment
  • Do a little demo video that explains the purpose, the contents, the data model and shows some of the possibilities
  • Provide a bit more background on the project: people, duration, financing.
  • Entities with 0 statements: Q125, Q126; write a query to find such entities, as well as duplicates (very similar label)
  • Fix wrong DHCR ID: https://mapping-dh.wikibase.cloud/wiki/Item:Q3 (25)