User talk:Iamcarbon
Welcome to Wikidata, Iamcarbon!
Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!
Need some help getting started? Here are some pages you can familiarize yourself with:
- Introduction – An introduction to the project.
- Wikidata tours – Interactive tutorials to show you how Wikidata works.
- Community portal – The portal for community members.
- User options – including the 'Babel' extension, to set your language preferences.
- Contents – The main help page for editing and using the site.
- Project chat – Discussions about the project.
- Tools – A collection of user-developed tools to allow for easier completion of some tasks.
Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.
If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.
Best regards!
Notability
[edit]Thank you for contributing to Wikidata. I see that you recently created an item that does not clearly indicate its notability. The Wikidata project only accepts items that meet its notability criteria, and your item is therefore likely to be deleted soon. In brief, items must have an associated Wikipedia article, must be needed for statements on another notable item, or must have both identifiers and serious sources. For the last case, a good indication of notability would be multiple articles about the subject in independent publications like newspapers or magazines. You can add such sources as references to specific claims using reference URL (P854), or as top-level claims using described at URL (P973).
Also, this may not apply in this specific case, but you should know that we discourage editors from contributing on topics with which they have a strong personal connection, as this may present a conflict of interest. If you are being paid to edit here, then you are obliged to disclose this. For a longer version, you might find it useful to read the essay "How to create an item on Wikidata so that it won't get deleted". Madamebiblio (talk) 10:32, 20 March 2024 (UTC)
Google Knowledge Graph Search API
[edit]I'm curious which knowledge graph API you're using. I recently imported a bunch of entries for people and was stuck with a 1QPS limit. I was using the enterprise API which is still in preview. Is the source for your bot public? I'm curious how you did the matching (e.g. in ambiguous cases).
I really wish google's API here was better. They clearly have a lot of data tied to these identifiers that they aren't releasing. BrokenSegue (talk) 14:31, 23 March 2024 (UTC)
- Hi @BrokenSegue!
- I'm also using the same Google Knowledge Graph (GKG) API, but have been able to stay under the 60qpm API limit as the rest of our pipeline is VERY slow.
- For context -- I have been building and annotating an internal dataset consisting of images (mostly in the Arts, Architecture, and Fashion fields) over the past ~10 years, which has grown to about ~15M images. I recently ran these through several new AI models (GPT-4 and Claude), and looked up the overlapping labels on Google Knowledge Graph (to determine basic notability / significance). We cross checked these against Wikidata to establish additional notability, where we discovered around ~70K missing ids.
- I was hoping to contribute the most notable of these entities and concepts back to Wikidata to improve interoperability with the Knowledge graph -- but have treading softly to make sure these contributions meet the project's goals and policies.
- I've started this process by manually looking up each entity / concept and determining whether it already exists (and just needs an association), or deciding whether to add it.
- Most of these knowledge graph entities DO exist. When there's only a few items with the same name, it's quick to find the right one and associate the id. When there are a lot of items, it takes time to manually look through them all to find the right one (particularly when the items don't have descriptions or labels). In the worst case, like "Vase of Flowers", there are hundreds of items with exact same name, and we need to match the actual photo. Matching specific pieces of ART is the most tedious, and takes a lot of time. My plan to is automate this in the future using a vision model.
- In the case where I have checked all the existing items with the same name, and various aliases and alternate identifiers, and am confident that do not exist, I have been doing a basic notability check (e.g. making sure that have multiple pages Google results mentioning them, with the source coming up first). It's been more difficult to find and reference reputable sources that aren't written by the author, as most of the top ranking websites are polluted with SEO or locked behind a paywall. It can take up-to 15 minutes per item to find 1-2 good sources -- even when the item is well known.
- I am estimating that out of 100 newly discovered knowledge graph labels that I've found, only 10 passed my basic notability check. The majority of these already exist (and just need the id association), and maybe 1-2 (of the 100) have been manually added. I've added around 250 items in total using this process. Iamcarbon (talk) 03:28, 24 March 2024 (UTC)
- Very cool work. Is this part of a research project or a business or is this just a hobby project? My imports of gkids was done much more conservatively as I checked to see if the URL they had on file matched the enwiki sitelink. I've also been interested in trying to also import Bing entity ID (P9885) but their API is too expensive last I checked. BrokenSegue (talk) 15:18, 24 March 2024 (UTC)
- This is currently a personal project exploring how how improve AI/ML algorithms ability to cite authoritative data. I'm hoping to apply this work to research/ personal knowledge building in the future.
- Comparing URLs makes a lot of sense. Once I get a bot going, I'll see if I can make any additional matches using this approach too.
- I haven't look at the bing entity ids, but these would also be great to get associated as well. I'll add this to my list to look into as well. Iamcarbon (talk) 21:13, 26 March 2024 (UTC)
- Very cool work. Is this part of a research project or a business or is this just a hobby project? My imports of gkids was done much more conservatively as I checked to see if the URL they had on file matched the enwiki sitelink. I've also been interested in trying to also import Bing entity ID (P9885) but their API is too expensive last I checked. BrokenSegue (talk) 15:18, 24 March 2024 (UTC)
Q125214215
[edit]Hello! I think iPhone 13 Pro Max (Q125214215) is the same as iPhone 13 Pro Max (Q108541741) and they should be merged. If you don't know how, I can! -wd-Ryan (Talk/Edits) 00:27, 30 March 2024 (UTC)
- Identical. Merged! (thank you!) Iamcarbon (talk) 01:54, 30 March 2024 (UTC)
Edition, not Book
[edit]Please use version, edition or translation (Q3331189) instead of book edition (Q57933693). And please do not use "book" in an English description, WikiProject:Books decided long ago to avoid using "book" because it has too many meanings in English and is confusing. --EncycloPetey (talk) 03:33, 16 June 2024 (UTC)
- Hi EncycloPetey.
- What would you suggest we use instead of "book" in the description to convey that the edition is a physical object that is printed, bound, protected by a cover (hard or soft), and assigned an ISBN (a numeric commercial book identifier) by a publisher.
- Are there any discussions you can point me to here? Iamcarbon (talk) 04:09, 16 June 2024 (UTC)
- We are rarely talking about a specific physical object. If we did, then we would be talking about one person's book, on the shelf, in their home. We are almost never talking about that. :: We usually mean an edition published on a certain date, including all the copies that were printed then. Or we are talking about a translation into another language, including all copies of that particular translation, not just one book.
- But a "book" can also be the work, in all its editions and translations. A "book" can be the volumes that make up a set, such as an encyclopedia. A "book" can be part of a literary work; and many classical texts are divided into "books". A "book" can also be an abstraction, such as the "book of love" or the "book of life". A "book" can be a major section of a work; The Lord of the Rings was published in three volumes, but it consisted of six "books".
- Even in a libaray, "book" might mean the physical objects on the shelves, on which case two copies of the same edition would count as two different "books". Or it might refer to digital "books" accessed via download. Or audio "books" on cassette, disc, or downloaded.
- So the reason we do not use "book" is that it is ambiguous. Once you avoid using "book", it becomes clear that there are many things that are "books", and so many in fact that the term is useless for Wikidata. This discussion comes up over and over and over, and WikiProject:Books is a frequent place where this discussion happens. --EncycloPetey (talk) 06:30, 16 June 2024 (UTC)
- Hi @EncycloPetey Thanks for the information! I've updated my scripts to use "version, edition or translation" and remove the use of book in the descriptions. Iamcarbon (talk) 00:16, 17 June 2024 (UTC)
Please do not turn a data item for an edition into a data item for a work. The two have completely different data. If the data on an item are primarily information for an edition (publisher, ISBN, number of pages, Open Library edition ID, Goodreads edition ID, LCCN, etc.) then the data item is for an edition. --EncycloPetey (talk) 02:23, 20 June 2024 (UTC)
For example, since you changed Lost Lives, Lost Art: Jewish Collectors, Nazi Art Theft, and the Quest for Justice (Q76574283) to a work instead of an edition, all of these data must be removed. They are data for an edition instead of a work. Please create a data item for the edition, and add all of the removed data to the edition data item. Otherwise, please restore the data and change it back to an edition data item. --EncycloPetey (talk) 02:27, 20 June 2024 (UTC)
- Nice spotting! I was actively in the process of breaking that object out into it's own edition, but it looks like you already started updating this object (let me know if there's anything else to do on Q76574283). Great to see you watching over the data and have your help bringing up data quality!
- Here's another example of an object that I broke into it's work and edition:
- https://www.wikidata.org/wiki/Q19595428
- I wouldn't mind a second set of eyes on my contributions, as I've been breaking out thousands of books into their editions. I'm treading lightly, but plan to eventually break out ALL of the editions / translations / etc. from their works. Always welcome to feedback! Iamcarbon (talk) 02:40, 20 June 2024 (UTC)
- @EncycloPetey Curious, do you get pinged automatically when I respond? Or does this require a mention in the response? Iamcarbon (talk) 02:41, 20 June 2024 (UTC)
- I do not get pinged automatically, but I usually watch for additional conversation. On the Q19595428, note that the Internet Archive ID is for a scan of a specific edition, and should be placed on the correct edition data item. --EncycloPetey (talk) 03:07, 20 June 2024 (UTC)
- @EncycloPetey Curious, do you get pinged automatically when I respond? Or does this require a mention in the response? Iamcarbon (talk) 02:41, 20 June 2024 (UTC)
Mul
[edit]You might want to hold off mass deletion of labels in favour of mul until it beds in a bit. Are you following https://www.wikidata.org/wiki/Help_talk:Default_values_for_labels_and_aliases Vicarage (talk) 14:40, 30 July 2024 (UTC)
- @Vicarage Agreed. And am now subscribed to Default_values_for_labels_and_aliases directly! Iamcarbon (talk) 18:38, 30 July 2024 (UTC)
Stop mul
[edit]Hi @Iamcarbon Do you make massive changes, without having requested bot permission? Please, you do NOT need to delete the labels that were already there. Thank you. Madamebiblio (talk) 04:05, 28 September 2024 (UTC)
- If you delete the 'Label' value from the names, there are gadgets that do not work. For this reason, I have now returned the values of a few hundred family names to thousands of Labels.
- For those looking to recover the deletion for family names in a language, here's a query: it lists family names that have P407 set but are missing a tag for the same language.
- With User:Harmonia_Amanda/namescript.js, hundreds of tags can be restored with a click of a button.
Example German:
SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item wdt:P31 wd:Q101352;
wdt:P407 wd:Q188.
MINUS { ?item rdfs:label ?hulabel FILTER ( lang(?hulabel) = "de" ) }
}
Example Finnish:
SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item wdt:P31 wd:Q101352;
wdt:P407 wd:Q1412.
MINUS { ?item rdfs:label ?hulabel FILTER ( lang(?hulabel) = "fi" ) }
}
Pallor (talk) 09:05, 28 September 2024 (UTC)
- Hi @Pallor
- These deletions were intentional and are necessary to identify potential issues before larger-scale deletions occur. Selectively deleting a small batch of labels each day helps discover any tools that may break and identify bots that are not yet 'mul'-aware. This proactive approach helps us address problems early and minimize disruptions when mass deletions eventually happen.
- There are already bots deleting labels in bulk across other less popular domains (e.g. Astronomical objects), and these deletions will also be taking place in mass for names (given and family) soon. By conducting a limited number of deletions now and engaging with tool owners, bot owners, an the community early, we can significantly reduce the impact of the upcoming mass removals that are planned to take place.
- Note that these deletions have been intentionally limited to identify any new issues. We still have several open Phabricator issues to prevent new labels (and duplicate items) from being re-added, and the community is still becoming aware of this feature.
- Are there any specific gadgets or tools that you have found not working as a result of these deletions? Sharing any details would be very helpful so we can assist in making them 'mul'-aware.
- It would also very helpful if you could share your thoughts and feedback in either of the following discussions:
- WikiProject Names (proposal for 'mul' adoption): https://www.wikidata.org/wiki/Wikidata_talk
- Mul_labels_-_proposal_of_massive_addition
- And the general discussion on deleting values and labels: https://www.wikidata.org/wiki/Help_talk
- I have ceased making any additional deletions until your concerns can be addressed. Iamcarbon (talk) 00:49, 29 September 2024 (UTC)
Missing mul constraint
[edit]I believe that, before massively removing the multi-language "reflexive" labels, a blocking constraint should be put in place to preemptively block adding identical language labels whenever a mul label is present. The same for aliases. Otherwise we have the risk that your changes are undone (explicit rollback, or implicit additions). There are users that don't know the Default values functionality. I have already updated my bots to no longer add duplicate language labels. --Geertivp 08:33, 28 September 2024 (UTC)
- Hi @Geertivp! Do you know if we have a Phab / tracking issue for adding a constraint to prevent duplicates from being re-introduced? Without the constraint, I agree -- any removals will cause more trouble then they solve.
- For some additional context: so far, my removals have been intended to identify any bots and tools that need to made mul-aware, and identify any other issues that we're not aware of yet before a broader rollout. I anticipate we may find additional additional issues over the next few months, and that all these issues can be worked on concurrently before any deletion rules are codified by bots. Also, thanks for updating your bot!!!
- I also intended for my deletions to stir up some trouble, add pressure for tool and bot owners to update, and help us gather any critical feedback on any issues that need to be addressed by wikidata developers for us to fully adopt mul. Iamcarbon (talk) 01:15, 29 September 2024 (UTC)
Warning
[edit]Please refrain from edits that have been criticised on AN until the discussion is resolved. If you do not, you will be blocked. --Wüstenspringmaus talk 16:40, 2 October 2024 (UTC)
- Hi there. I have responded one the administrators board. Happy to request a bot permission, as needed - once guidance has been provided. Iamcarbon (talk) 17:17, 2 October 2024 (UTC)
- I have gone ahead and made a request for a bot permission.
- https://www.wikidata.org/wiki/Wikidata:Bot_requests#Request_to_add_mul_label_values_to_names_.._(2024-10-01) Iamcarbon (talk) 17:24, 2 October 2024 (UTC)
- Request for bot permission is supposed to happen at Wikidata:Requests_for_permissions/Bot (yes it's confusing) BrokenSegue (talk) 17:45, 16 October 2024 (UTC)
- Many thanks! I was unable to find any guidance on how to request a bot permission. Iamcarbon (talk) 18:33, 16 October 2024 (UTC)
- Request for bot permission is supposed to happen at Wikidata:Requests_for_permissions/Bot (yes it's confusing) BrokenSegue (talk) 17:45, 16 October 2024 (UTC)
Bulk removal of mul
[edit]Did you get approval for this change? Your edits are going really fast suggesting that you are running an unapproved bot (that isn't rate limiting). The only discussion I see is at Wikidata talk:WikiProject Names where I don't see consensus for this change.
I have temporarily blocked you until you respond here. BrokenSegue (talk) 17:44, 16 October 2024 (UTC)
- Hi @BrokenSegue. This was a batch edit of 1000 items to remove duplicate aliases on family names running with-in the rate limit of 90 edits per minute.
- Quick context on this edit: right now, we're blocked on removing primary labels from given and family names - as we wait for the development team to address Phab issues related to search rankings and duplication checks. These issues, however -- do not prevent us from removing duplicated aliases. And there are no known current issues with deleting aliases.
- I think we should continue to test this on a limited set of items to make sure these removals don't surface any new issues, before deleting the remaining 50M+ duplicated names allies on 650K+ items. Ideally an existing bot can take this over -- or I will codify this rule, once my bot request is approved.
- RE the bot role: There was a fairly extension discussion about automated edits, and whether a bot is required for these changes here. While there are benefits of running the changes under a bot, it looks like our policy is that these bulk edits are also permissible under a user account.
- https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard/Archive/2024/10#c-Madamebiblio-20241002161300-To_be_considered_by_other_sysop
- Let me know if these edits are disruptive, or if you think this is a reasonable plan before my bot account is approved. I understand that the Wikidata backend services are under some pressure to scale, and given that names are responsible for over 10% of all duplicated labels -- hoping to help make sure we identify and address any issues blocking their deletion. Iamcarbon (talk) 18:32, 16 October 2024 (UTC)
- Alright, take 2! I've gone ahead and created a new bot account, and requested permissions here:
- https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/CarbonBot Iamcarbon (talk) 18:53, 16 October 2024 (UTC)
- A bit more context here, as well: the deletion of aliases were proposed on "Should aliases same as mul label be removed?" under https://www.wikidata.org/wiki/Help_talk:Default_values_for_labels_and_aliases
- With three supporters, and no objections. Note that Pallor continues to have objections with removing labels (not aliases), as this borks up the search rankings.
- I'm also still unsure if removing aliases will have unintended side effects, and the default labels group hasn't been getting much feedback until we break something for someone. Once we make these changes under the radar using a bot (under a pro-approved task), it's less likely these changes will be seen or receive feedback.
- I have also engaged with both the names and label groups, to solicit feedback on these new bot tasks (via CarbonBot).
- Let me know your thoughts. Iamcarbon (talk) 19:37, 16 October 2024 (UTC)
- 90 edits per minute seems quite fast. Typically you should be watching max lag and going slower than that. Please take a look at the maxlag documentation and consider further rate limiting in the future.
- I understand the use case and that you were doing a limited set. I won't comment on whether this is good as I'm not very knowledgeable about mul. BrokenSegue (talk) 21:42, 16 October 2024 (UTC)
- There is a very gross misunderstanding here, or you are consciously going against the promise of the community and your own commitments. It was said that the whole project is about languages using the Latin script. It was also said that you delete aliases where the label and alias are the same. Compared to this, you are constantly deleting the aliases of languages using the Arabic and Cyrillic scripts, as well as all aliases that do not have a tag. This element: Geramb (Q130466234) has a relatively short page history, and it is easy to follow that your edits are completely opposite to those previously discussed. Pallor (talk) 21:02, 19 October 2024 (UTC)
- Hi @Pallor
- May we move this discussion to the names & labels discuession, so we can get community input on this? I'm unsure if I understand the concerns here.
- Two edits were made to the Geramb page:
- 1) The mul label was set to native value, which matches the item's native label "Geramb", in multiple languages.
- 2) The duplicate aliases, matching Geramb were removed. Iamcarbon (talk) 21:13, 19 October 2024 (UTC)
- hi, you deletion of labels is making Qs appear when doing WD queries in other languages than english. Can you take a look, please?Kippelboy (talk) 05:41, 26 October 2024 (UTC)
- Hi @Kippelboy
- Are you able to share your WD query and language so we can figure out what broke here. It's likely that the query can adjusted to be mul aware. This change is also needed to make sure your queries work with new items that may only have a single mul label. There's already quite a few of those now.
- Some more context here: A limited number of labels were deleted when they matched the default label to help identify issues related to the default values and labels feature. There are plans (now delayed) to remove many more duplicated labels, and we want to make sure this doesn't cause too much disruption. Iamcarbon (talk) 19:04, 26 October 2024 (UTC)
- There is a very gross misunderstanding here, or you are consciously going against the promise of the community and your own commitments. It was said that the whole project is about languages using the Latin script. It was also said that you delete aliases where the label and alias are the same. Compared to this, you are constantly deleting the aliases of languages using the Arabic and Cyrillic scripts, as well as all aliases that do not have a tag. This element: Geramb (Q130466234) has a relatively short page history, and it is easy to follow that your edits are completely opposite to those previously discussed. Pallor (talk) 21:02, 19 October 2024 (UTC)