Discussion of Python in High Energy Physics https://hepsoftwarefoundation.org/activities/pyhep.html
since the knowledge in this channel proved invaluable before, another question :)
I have
group_1 = np.array([(1, 2), (3, 3), (5, 7), (4, 4)])
test_elements = np.array([(1, 2), (3, 3), (3, 5)])
and would like to test if the elements in test_elements
are in group_1
. I expect the result
[True, True, False]
as I take the tuples as unique objects.
Numpy has the function isin
where
np.isin(group_1, test_elements)
will return
[[True, True], [True, True], [True, False], [False, False]]
OK, so this is inverse to what I want, fine.
np.isin(group_1, test_elements)
# returns
[[True, True], [True, True], [True, True]]
Clearly it compares element by element and since both 3
and 5
are contained, therefore (3,5)
should be as well, right?
Well, not in my case. Is there a way to do this comparison for each 2-vector instead of element-wise? For loop (even with numba) is quite slow
Yes, you can do that. The idea is: make a comparison of all possible combinations of each element with each other element. This gives you a rank three boolean object with: number of elements in the group, number of elements to test, dimension of an element. Then make two reduce operations: 1. reduce all
on the axis of the tuple, requiring that true is if in a tuple everything is true and 2. a reduce any
on the axis of all the possible combinations, since at least one tuple has to be fully contained.
For example (may change the axis for convenience):
test_elements_expanded = np.expand_dims(test_elements, axis=1)
entries_equal = group_1 == test_elements_expanded
tuple_equal = np.all(entries_equal, axis=2)
tuple_contained = np.any(tuple_equal, axis=0)
entries_equal
is off :(
true
where all elements in a tuple are true, otherwise false, reducing axis 1 with any
means that an entry with at least one matching tuple is true. So your left with the axis 0.
You can also map the tuples to scalars (easy if you have some idea what the values are going to be), e.g:
def squash(x):
return 10000 * x[:,0] + x[:,1]
np.in1d(squash(test_elements), squash(group_1))
This should be more memory-efficient and faster if both arrays are large.
If you are going to do membership testing repeatedly on the same array, it might be even better to convert it to a set, dictionary or some other object backed by a hash table, so membership tests are a constant time operation.
10000
come from?
Comparing the two solutions:
yours: 28366.0 microseconds
stackoverflow: 10999.0 microseconds
So essentially same speed for this exact problem. At this point, it matters, if: you call it once or a million times? How big is your array really? That's when things like presorting can make the difference. My advice: use which ever method you understand/like better (not from the speed, from the concept) and try only to improve on it if it proves to be a bottleneck.
Registration is now open for PyHEP 2019, in Abingdon, UK, from the 16th to 18th of October! The registration fee for the 2.5 days has been set at £80; it includes the venue, lunches, dinners, and refreshments. We also have about 46 rooms at Cosener’s House, available on a first-come-first-served basis. The actual payment system will not be online for a few more days, however, so you’ll only be able to complete registration then including the room booking.
The agenda is also shaping up with talks confirmed on topics ranging from histogramming, statistical methods, distributed workflows, visualisation, and even GPU-programming. Several speakers from industry are confirmed, including our keynote speaker on the PyViz library.
Since the PyHEP series is all about growing a “Python in High Energy Physics” community, this year we’re also including a session of lighting talks where 30 people can present any topic of their choosing for 3 minutes with a single slide as a way for everyone, especially newcomers and early careers researchers, to introduce themselves.
Community members can also propose presentations on any topic (email: pyhep2019-organisation@cern.ch). We are particularly interested in new(-ish) packages of broad relevance.
More details can be found on the indico page (https://indico.cern.ch/e/PyHEP2019) or from the PyHEP WG homepage http://hepsoftwarefoundation.org/activities/pyhep.html. You can also join the HSF forum (https://groups.google.com/forum/#!forum/hsf-forum) to get more information about the workshop and community
Hi @all We are looking for PhD students in in physics, computer science, and data science to attend a three-day OpenHack in September to analyze real physics data from the LHCb experiment at CERN using Microsoft AI technologies.
An OpenHack is challenge rather than instruction-based. Students will work directly with physicists from CERN and Cloud Advocates from Microsoft. They will progress through these challenges to analyze data from LHCb and search for the “unexpected” in particle collisions:
Data exploration and visualization
Classification and anomaly detection
Source control and automation
AML experimentation
AML for hyperparameter tuning
Real-world application of data
The OpenHack will be held Sept. 11-13 in northern Italy at Fondazione Bruno Kessler, a scientific research institute affiliated with CERN. Students need pay only for their travel and lodging – there is no registration fee for the OpenHack itself. We will help find lodging.
The registration form is here. Please encourage your students to attend this unique training event and to contact monicar@microsoft.com with any questions.
To @all:
Registration for the PyHEP 2019 workshop has been extended to September 15th.
As a reminder, the registration fees for the 2.5 days has been set at £80. It includes the venue, lunches, dinners, and refreshments.
We still have rooms available at Cosener’s House, the venue, available on a first-come-first-served basis.
The agenda is also shaping up with talks confirmed on topics ranging from histogramming, statistical methods, distributed workflows,
visualisation, and even GPU-programming. Two speakers from industry are confirmed, including our keynote speaker on the PyViz visualisation project.
Since the PyHEP series is all about growing a community, this year we’re also including a session of lighting talks
where 30 people can present any topic of their choosing for 3 minutes, with a single slide, as a way for everyone,
especially newcomers and early careers researchers, to introduce themselves.
Community members can also propose presentations on any topic (email: pyhep2019-organisation@cern.ch).
We are particularly interested in new(-ish) packages of broad relevance.
Note that partial travel support for some U.S. participants (in particular, students and early-career postdocs)
may be available from the IRIS-HEP institute. Please contact Peter Elmer (Peter.Elmer@cern.ch) to enquire about details.
More details can be found on the indico page https://indico.cern.ch/e/PyHEP2019
or from the PyHEP WG homepage http://hepsoftwarefoundation.org/activities/pyhep.html.
You can also join the PyHEP WG Gitter channel (https://gitter.im/HSF/PyHEP) and/or
the HSF forum (https://groups.google.com/forum/#!forum/hsf-forum) to get more information about the workshop and community.
Hope to see you there!
Eduardo Rodrigues & Ben Krikler, for the organising committee
HSF PyHEP WG topical meeting on fitting tools, Sep. 11th @ 17h CET
Dear Python enthusiasts,
The HSF PyHEP WG is restarting activities post-Summer with topical meetings (not to be confused with the workshop in the UK ;-)).
The first one will be on the hot and important topic of fitting (tools)! It will take place on Wednesday September 11th at 17h CET.
The agenda, which you can find at https://indico.cern.ch/event/834210/, contains 2 presentations,
one from HEP, and one from an astroparticle physics community colleague:
Take this opportunity of cross-exchange to come and discuss needs, technical design, functionality requirements, etc.!
Hoping to see you there!
Eduardo, for the PyHEP WG conveners
P.S.: Note that a second topical meeting on fitting tools will likely happen as a follow-up.
Has anyone here ever been involved with Hacktoberfest: https://hacktoberfest.digitalocean.com/ ?
have two t-shirts that say "yes, I have"