It is not a mistake, nor can we fix it, unfortunately.
As explained in the MET2013 variable description, it is not possible to identify the exact populations of metropolitan areas in public-use microdata samples (PUMS) because the lowest level of geography available in the PUMS are PUMAs, which often straddle the boundaries of metropolitan areas. The protocol for MET2013 is to assign a metro area code to _all_ residents of a PUMA if a majority of the PUMA’s population lies in the metro area.
Through the MET2013 variable description, you can find crosswalks that identify the complete relationships between PUMAs and metro areas. In the case you describe, the crosswalk for 2010 PUMAs (used in 2012 and later ACS samples) shows that the population of the Pennsylvania part of the NY metro area comprises only 37% of the population of the PUMA in which it lies. Because less than 50% of the PUMA’s population lies in the NY metro area, IPUMS USA did not include that PUMA in the set of PUMAs we identified to be part of the NY metro area.
Overall, however, the Pennsylvania part of the NY metro area represents a very small percentage of the whole metro area, and even though the MET2013 code for NY omits some households that were officially in the metro area, the total omission error is only 0.3% in terms of the metro area’s 2010 population (according to the “match errors” file that is also available through the MET2013 variable description).